Encourage small PRs that can be contained in one commit

oss-generic / process

The OSS-Generic Reference Process

https://oss-generic.github.io/process/

MIT License

6 stars 19 forks source link

Encourage small PRs that can be contained in one commit #43

Open damithc opened 7 years ago

damithc commented 7 years ago

I guess we want to encourage small PRs that can be contained on one commit (i.e. break things into smaller PRs when possible). Faster to merge and easier to do post-production. Another advantage is we can do a fast forward merge (provided we ask the dev to follow our convention for merge commits), or a squash-merge so that the version tree is tidier.

damithc commented 7 years ago

@pyokagan your thoughts? For example, do you think https://github.com/se-edu/addressbook-level4/pull/274/files can be made a one-commit PR?

pyokagan commented 7 years ago

@damithc

I guess we want to encourage small PRs

Yes, smaller PRs are usually good. However, note that there is some overhead with creating a PR vs keeping everything in a PR, since a PR means creating a branch. Branches become hard to manage when you have too many of them depending on one another.

that can be contained on one commit (i.e. break things into smaller PRs when possible).

It's not really about having "one commit", but more about the scope of the PR. For example, even though https://github.com/se-edu/addressbook-level4/pull/272 contains multiple commits I'd still consider it a "small PR" since it essentially does only one thing: the extraction of classes.

We could, of course, split each of its commits into PRs, but at that point the overhead of having to manage all these extra PRs far outweighs the benefits of keeping each PR small, since the commits in the aforementioned PR touch the same area of code and thus strongly depend on one another.

Faster to merge and easier to do post-production.

Yes, those are indeed some of the benefits.

Another advantage is we can do a fast forward merge (provided we ask the dev to follow our convention for merge commits),

Why would that be a good thing? As I said earlier, fast forwarding (which means a merge commit is not created) means we can't tell the start and end of a PR.

or a squash-merge so that the version tree is tidier.

Just be careful not to lose too much history :-)

Note that for squash merges we lose the ability to have a descriptive commit subject, since we require the merge/squash commit to use the PR title instead, and the PR title is the issue title which usually does not describe the changes of the PR.

For example, do you think https://github.com/se-edu/addressbook-level4/pull/274/files can be made a one-commit PR?

In this case, yes, it can.

damithc commented 7 years ago

@pyokagan some follow up comments. Respond when you can. No hurry.

Note that for squash merges we lose the ability to have a descriptive commit subject, since we require the merge/squash commit to use the PR title instead, and the PR title is the issue title which usually does not describe the changes of the PR.

I think this is a good enough reason to insist on merge commits for every PR. i.e. merge commit subject describes WHAT problem is being solved or WHAT enhancement is being done while commit message subjects describes HOW it is being done. Does that sound right?

Coming back to the topic of this issue, you are right, breaking into smaller PRs has its overheads. Another problem is when something is broken up to smaller PRs and the dev disappear after doing only some of them, leaving the code base in a 'half-baked' state. e.g. asciidoctor was added to gradle but rest of the steps were not done.

What I'm trying to avoid is devs splitting PR into multiple commits unnecessarily, which can soak up lot of dev time in interactive rebasing during iterative refinement of the PR. In addition, we don't have an easy way to ensure each commit is passing the build. So I'm thinking devs should be encouraged to push the 'one commit' option to its limits before moving to the 'multiple commits' option and even in the latter case, try to minimize the number of comments. If the reviewer finds a specific commit hard to review, he can always request a split of commits. So it's a choice between 'if in doubt, split' vs 'if in doubt, don't split'.

To get a sense of where we want the PRs/commits to split up, consider these two hypothetical situations for https://github.com/se-edu/addressbook-level4/pull/272. Are they acceptable too (note that both violate our 100 LoC rule)? Which of the three alternatives is the preferred one?

Split as two PRs as follows, each containing only one commit.
- Extract common parser constants/methods to separate classes
- Extract *CommandParser classes
Stays as one PR but has only two commits (as described above).

On a related note, do we force reviewers to review one commit at a time? I find myself trying to review the full diff first and only falling back on incremental reviews if the full diff review is hard to do in one sitting. Incremental review is more rigorous and likely to catch more problems but also extra work for the reviewer. However, if we want to improve the quality of reviews, we may have to enforce incremental reviews?

pyokagan commented 7 years ago

@damithc

merge commit subject describes WHAT problem is being solved or WHAT enhancement is being done while commit message subjects describes HOW it is being done. Does that sound right?

Yes, that sounds right.

Another problem is when something is broken up to smaller PRs and the dev disappear after doing only some of them, leaving the code base in a 'half-baked' state.

Yes, that is another additional problem. One criteria we could use to avoid these kind of problems would be to ensure that PRs provide value on their own (e.g. they do not require another PR to be merged in the future in order to have any value)

What I'm trying to avoid is devs splitting PR into multiple commits unnecessarily,

Yes, that is a problem. I'd think that it is up to the reviewers to tell contributors to squash their changes as appropriate, or move commit(s) into a smaller PR. it is not always obvious at the start of the PR whether it could be split into smaller PRs or not.

In addition, we don't have an easy way to ensure each commit is passing the build.

Hmm, I do wonder if it is possible to teach Travis to run tests for every commit in the PR ;-) I have a hunch that it might be possible, since Travis is just a VM after all.

So it's a choice between 'if in doubt, split' vs 'if in doubt, don't split'.

imo, I've always advocated preferring "if in doubt, split" since Git makes it easier to squash commits rather than to split them. It is also much easier for reviewers to say: "squash [v1 1/3] and [v1 2/3]" rather than saying "There are changes X and Y in [v1 1/1], please split them", where X and Y are long descriptions of the changes.

(To be continued...)

pyokagan commented 7 years ago

To get a sense of where we want the PRs/commits to split up, consider these two hypothetical situations for se-edu/addressbook-level4#272. Are they acceptable too (note that both violate our 100 LoC rule)? Which of the three alternatives is the preferred one?
Split as two PRs as follows, each containing only one commit.

Extract common parser constants/methods to separate classes
Extract *CommandParser classes

It wouldn't make sense for Extract common parser constants/methods to separate classes to be merged into master as a PR on its own. This is because at the point of the commit, all of the parser constants/methods are only used in the Parser class itself. It is perfectly fine for these constants/methods to remain as private within the Parser class, since they can be considered to be an internal implementation detail.

In other words, having that commit as a single PR on its own doesn't add any value. It'll only have value when paired with a subsequent PR, which is an indication that they should be together.

But we might want to balance this with other factors like "reducing merge conflicts with other PRs".

Stays as one PR but has only two commits (as described above).

I'd actually like to choose a 3rd option: the PR is fine just as it is.

It may seem that the PR might be excessively split into commits, since we have these commits:

[v4 2/7] Extract AddCommandParser from Parser class
[v4 3/7] Extract EditCommandParser from Parser class
[v4 4/7] Extract DeleteCommandParser from Parser class
[v4 5/7] Extract SelectCommandParser from Parser class
[v4 6/7] Extract FindCommandParser from Parser class

And all of them follow the same pattern.

However, these commits are actually all mechnical changes. i.e., it is a cut-and-paste from Parser.java into the new parser class. Having each of them as a separate commit is useful since we can easily tell the source and destination of each cut-and-paste.

If we squash these steps all into one commit, all we would have is one gigantic "deletion" block in the diff of Parser.java. Yes, it is still possible to see which of the parser classes the code was cut-and-pasted into, but it requires more time and effort on the part of the reviewer and future readers as well.

pyokagan commented 7 years ago

On a related note, do we force reviewers to review one commit at a time? I find myself trying to review the full diff first and only falling back on incremental reviews if the full diff review is hard to do in one sitting.

Yeah, practically speaking we can't force reviewers to review in a single way. Reviewers will just naturally choose to review in whatever way they are most comfortable with.

But we should have some conventions on how reviewers can communicate their way of reviewing, e.g.:

If the reviewer is reviewing a single commit, then a reference to the commit should be placed at the top of the review message.
If the reviewer is reviewing the whole PR diff at one go, then there is no need to place a reference to the commit.
Review of the commit message should be placed in the review message body, rather than as a comment on the diff, otherwise the comment will stick around until the end of the PR :-P

And yeah, I personally do a mix of "full-PR" and "commit-by-commit" reviews, simply because most of the time the commits are not well organized such that I need to look at future commits in order to understand what a commit is doing.

Incremental review is more rigorous and likely to catch more problems but also extra work for the reviewer.

Hrm, I'd say that incremental review is not necessarily rigorous -- it is still possible for reviewers to do a non-rigorous review when looking at each commit separately. Furthermore, it is also possible for reviewers to do a rigorous review even when looking at the full PR diff.

So, I'd think that having this "proper commit organization" thing doesn't necessarily improve the quality of the reviews. All it does is that it opens up new possibilities for review approaches, some of them more rigorous than what was being done previously. However, this all still rests on whether reviewers themselves would be willing to commit to doing more rigorous reviews in the first place. It's still good to keep our options open, though, in case we do get reviewers who care :-)

However, if we want to improve the quality of reviews, we may have to enforce incremental reviews?

Therefore, as mentioned earlier, there's no point in enforcing incremental reviews, since it does not guarantee rigorous reviews. I think what's more important is enforcing small PRs and attracting developers who care about code quality.

pyokagan commented 7 years ago

I think what's more important is enforcing small PRs

By the way, I think we need to be clear on what a "small PR" is. A PR may be "big" in terms of lines of code changed, however these massive changes may just be a result of using automated refactorings. On the other hand, a "small PR", in terms of number of lines of code changed or number of commits, may actually be doing lots of complex changes.

So, I think that rather than aiming for "small PRs", we should aim for "simple PRs" instead. This means: PRs which do not contain many logical changes.

e.g. se-edu/addressbook-level4#272 may seem to be a "big" PR, since it touches 300+ lines of code. However, conceptually it is very simple: it is just cut-and-pasting. The simplicity in its approach makes the PR easy to review and thus much more likely to be merged quickly.

damithc commented 7 years ago

Thanks for all the input @pyokagan 👍