yegor256 / cobench

Coders Benchmarking Script: automatically collects productivity metrics from a team of GitHub contributors
MIT License
3 stars 1 forks source link

Demotivating reviews #57

Open volodya-lombrozo opened 1 year ago

volodya-lombrozo commented 1 year ago

Let's consider the next PR: https://github.com/objectionary/eo/pull/2264 where I provided 4 rounds of comprehensive review. In the final statistics this PR will counted only as a single review. Then let's take another PR review: https://github.com/objectionary/eo/pull/2189 where I left only few comments and spent much less time for problem identifying comparing with the first PR.

So, I believe it is extremely demotivating statistics. It either motivates you to ignore PR reviews because they don't give reasonable number of points or motivates you just say "Looks good to me" without careful review. Both actions lead to the poor code quality in my opinion.

volodya-lombrozo commented 1 year ago

@yegor256 Please, take a look

yegor256 commented 1 year ago

@volodya-lombrozo in early days of Zerocracy we were counting messages posted by reviewers into PRs and then paying for them. This was the motivator for the reviewers to post as many messages as possible (like in your first example). However, we abandoned this practice and introduced the role of QA to projects. Read this policy, the section about QA: http://www.zerocracy.com/policy.html This QA person is supposed to catch situations where reviewers simply say "LGTM" and merge, and punish them.

How do you suggest improving the algorithm of this library?

volodya-lombrozo commented 1 year ago

@yegor256 I’ve not come up with the final “algorithm” yet, but we can discuss it here:

  1. It is better to start count “suggested changes” and “approved” events instead of entire PR as one review
    1. Let’s consider the next PR again: https://github.com/objectionary/eo/pull/2264
      1. We have 4 “suggested changes” from me + 2 “suggested changes” from you + 1 “approve” from me (Total 7 events)
      2. All of the events should have their own number of points
      3. For example, let’s take 10 points for “suggested changes” and “5” for “approved” event, then I will get 45 points, you will get 20 points.
  2. Then, the second step would be to split “suggested changes” on suggests messages:
    1. The first “suggested changes” has 5 messages, the second 7 messages, and so on. It means that all “suggested changes” aren’t equal (of course).
  3. Then, we will have to invent more sophisticated mechanism which will decide whether a suggested message was actually valuable for the project
    1. For example, we can check if a programmer change the reviewed line of code after “suggested changes”

P.S.1 It is just a sketch


P.S.2 As for QA, from the description of the position: it seems like QA job might have been done by a robot (3 point of my "algorithm").

volodya-lombrozo commented 1 year ago

@yegor256 what do think?

yegor256 commented 1 year ago

@rultor release, tag is 0.0.48

rultor commented 1 year ago

@rultor release, tag is 0.0.48

@yegor256 OK, I will release it now. Please check the progress here

rultor commented 1 year ago

@rultor release, tag is 0.0.48

@yegor256 Done! FYI, the full log is here (took me 3min)

yegor256 commented 1 year ago

@volodya-lombrozo let's see how will this edition work

levBagryansky commented 1 year ago

@volodya-lombrozo I agree that reviews are unequal among themselves, and should be considered differently. Nevertheless, in my opinion it is important that the average review continues to give approximately the same points as now (150-200). That is, a large qualitative review gave large, low-quality gave less, but on average the quantitative meaning of the review was preserved. Maybe the resulting amount should be normalized by the sum of all the team reviews, or take a closer look at the proposed coefficients

volodya-lombrozo commented 1 year ago

@levBagryansky

  1. If we "normalize" reviews, it actually won't solve the original problem. It means that I will receive approximately the same number of points, 150~200, regardless of the number of suggestions, even if there are only a few of them. Then, what is the point of providing a careful review?
  2. Which coefficients do you mean? How are you going to calculate them?
  3. Regarding the statement "but on average the quantitative meaning of the review was preserved": Let's consider a hypothetical PR with 10 lines of code and 5 comprehensive review rounds related to that PR. Of course, this implies that the quality of the PR is extremely low, but the PR author will still receive significantly more points compared to what a reviewer would get. Therefore, I believe that a review can be worth more than the PR itself.
levBagryansky commented 1 year ago

@volodya-lombrozo

  1. no, norimilizing must by by all review in the team. It means that you can get in average for you 500 when other get 20
  2. I mean coefficients that you suggest to add for pr:

    For example, let’s take 10 points for “suggested changes” and “5” for “approved” event, then I will get 45 points, you will get 20 points.

  3. Yes, review can be worth more than the PR itself in some cases. But if average by all team review is worth than average pr it means that in average review takes more time than pr (If we are considring two equal programmers adjacent equal effort).
volodya-lombrozo commented 1 year ago

@levBagryansky I didn't clearly understand your (3) point, to be honest, but I think we still have to give more points for an average PR comparing to the review attached to the PR. For example for an average PR the picture should be the next (I get "coefficients" from my head - I don't know the exact numbers):

1) PR "I've done something" - author gets 150 Points 2) Simple review to that PR with several comments - reviewer gets 50 points 3) Author makes some changes - author gets 0 Points 4) Reviewer accepts the PR - reviewer gets 10 points

In summary: Author gets 150 points, Reviewer gets 60 points

levBagryansky commented 1 year ago

@volodya-lombrozo yes

yegor256 commented 1 year ago

@rultor release, tag is 0.0.49

rultor commented 1 year ago

@rultor release, tag is 0.0.49

@yegor256 OK, I will release it now. Please check the progress here

rultor commented 1 year ago

@rultor release, tag is 0.0.49

@yegor256 Done! FYI, the full log is here (took me 3min)