llvm / llvm-iwg

The LLVM Infrastructure Working Group

https://foundation.llvm.org/docs/infrastructure-wg/

Other

17 stars 14 forks source link

A Request for Comment on Code Review Process #73

Open tstellar opened 2 years ago

tstellar commented 2 years ago

Proposal

The LLVM Foundation Board of Directors is seeking comment on the current state of Code Review within the LLVM Project and its sub-projects. Phabricator is no longer actively maintained and we would like to move away from a self-hosted solution, so our goal is to determine if GitHub Pull Requests are a good alternative to our current code review tool: Phabricator.

Specifically we are looking for feedback on:

What features or properties make Github Pull Requests better than Phabricator?
What features or properties make Phabricator better than GitHub Pull Requests?
What new workflows or process improvements will be possible with GitHub Pull Requests?
Which workflows aren’t possible with GitHub Pull Requests?
Any other information that you think will help the Board of Directors make the best decision.

Where to Direct Feedback

Please provide feedback on this Infrastructure Working Group ticket. This will make it easier to collect and consolidate the responses. At the end of the comment period the Infrastructure Working Group will collect the feedback for further analysis and summarization.

Timeline

The timeline for this RFC will be as follows:

RFC posted for public review and comment
30 days after the date of posting, public comment closes.
IWG will have 14 days from closure of public comments to review and summarize public comments into a pros and cons list to be present to LLVM Foundation Board
Foundation Board will have 30 days to make a final decision about using GitHub Pull Requests and then communicate a migration plan to the community.

davidchisnall commented 2 years ago

@rengolin I'm not sure that I agree with this:

Having hundreds of branches in the main repo because of PRs and patch sets is unmaintainable. Anyone deleting any branches is dangerous, as GH doesn't really check much before deleting. I don't think that's a viable solution.

I'm not sure I agree with respect to branches in the main repo. A branch in the main repo doesn't consume much space in clones (and may not be cloned by default), just the space. The only overhead is of the extra things in the namespace. This is avoidable by designating a convention like wip/{feature} or wip/{github username}/{feature} for things that are not yet merged. Git doesn't seem to have problems with repos with hundreds of branches and as long as wip branches are viewed as a staging ground for things that are used for collaboration on features that are not yet ready for merging, rather than dumping grounds for unfinished work, having them in a single place is valuable because anyone with LLVM push access can work on them without needing separate approval to contribute to some person / company's LLVM fork.

PRs are also branches in git, which are deleted after they are merged. They are in the pr/* namespace and so aren't cloned by default, but you can add this bit of the namespace to your fetch line in your git config and get them all.

rengolin commented 2 years ago

A branch in the main repo doesn't consume much space in clones (and may not be cloned by default), just the space.

@davidchisnall I didn't mean server/git maintenance, I meant it needs extra rules that are hard to enforce. Later on I comment on the branch naming, which is what you suggest, but that's too many things in a single namespace. In that sense, a fork is equally easy for git to maintain, but it keeps all the internal/wip branches to each user.

having them in a single place is valuable because anyone with LLVM push access can work on them without needing separate approval to contribute to some person / company's LLVM fork.

That has positive and negative effects. I wouldn't want anyone (with LLVM access) being able to push to my PR while I'm working on it. GH has the diff suggestion that allows people to suggest code changes and let me merge them if I accept the changes. I certainly don't want people pushing to my PR after I'm done working on it and waiting for some tests to pass.

pogo59 commented 2 years ago

@davidchisnall regarding this:

A branch in the main repo doesn't consume much space in clones (and may not be cloned by default), just the space. The only overhead is of the extra things in the namespace.

Depending on the task, the overhead can be significant. Our downstream repo tends to accumulate hundreds of branches, and there is some bit of our infrastructure that needs to iterate over all branches; when things are cluttered, this takes well over 10 minutes. Not ideal. (I can't say I remember why we have a job that does this iteration, but it made sense when it was explained to me.)

pogo59 commented 2 years ago

@kwk regarding this:

Suppose you need to change code in LLVM for a setup which you don't have access to but there's a buildbot worker for to test it. Why not use that?

That would be TOTALLY AWESOME. I have a patch pending that works everywhere except for one ppc bot, and the failure makes no sense whatsoever. Being able to experiment on my own branch and fire off that bot would be a HUGE win. As things stand currently, I'd have to commit something, wait for the bot to fail, try something else, wait for the bot to fail... all in the main branch, cluttering up the commit history and causing fail-mail to go to lots of people unnecessarily.

pogo59 commented 2 years ago

@kwk regarding this:

I know there are voices who think that pre-merge testing is too slow and cannot be done effectively

I think you are distinguishing a more voluntary pre-merge testing from mandatory pre-merge CI testing? The latter must have sufficient throughput to avoid becoming a bottleneck. I collected some stats several years ago, and the number that sticks in my head is a commit pace of roughly 40 per day, or almost one every 30 minutes, 24/7. For mandatory pre-merge CI testing to not be perceived as a bottleneck, it would probably have to iterate in about 10 minutes, preferably less. I haven't looked to see what the pace is like nowadays but it's unlikely to be slower.

kwk commented 2 years ago

@kwk regarding this:

I know there are voices who think that pre-merge testing is too slow and cannot be done effectively

I think you are distinguishing a more voluntary pre-merge testing from mandatory pre-merge CI testing? The latter must have sufficient throughput to avoid becoming a bottleneck. I collected some stats several years ago, and the number that sticks in my head is a commit pace of roughly 40 per day, or almost one every 30 minutes, 24/7. For mandatory pre-merge CI testing to not be perceived as a bottleneck, it would probably have to iterate in about 10 minutes, preferably less. I haven't looked to see what the pace is like nowadays but it's unlikely to be slower.

On average there are ~74 commits each day (source: https://github.com/llvm/llvm-project/graphs/commit-activity):

Bildschirmfoto von 2021-10-08 17-17-13

Suppose that commits arrive evenly distributed around the clock, we have 24h/74commits = ~19 minutes/commit to build it. Of course, we don’t have such an even distribution.

ghost commented 2 years ago

I'm not sure if this is the right place or time to have this discussion, but if we do switch to GitHub PRs, we should probably implement some guidelines on how they are managed and merged.

Most importantly, if/when to use each of the 3 PR merge strategies. However, that is going to be very closely related to decisions like how to update an open PR, if/when it is acceptable to force push to a branch (both with and without a PR), when to make something a single PR but with multiple commits and when to divide it into multiple stacked PRs, etc.

Since each PR in phabricator could only contain 1 commit, none of these decisions had to be made, and the main branch has remained very clean, simple and linear. I don't think a 1 size fits all strategy will work for GH though.

Using the merge commit PR resolution strategy recklessly could lead to a mess of merges left and right that is difficult to follow, and makes tools like git blame and git bisect far less effective.
The rebase and merge resolution strategy would keep the history linear, but in order to prevent incomplete commits from getting into the main branch, people need to keep their PR branch commit history clean and tidy. Not everyone is familiar enough with git to be comfortable re-writing their branch history, and it requires people to force-push to PR branches, making it harder for reviewers to see what was changed. However, if you (and your reviewers) are sufficiently familiar with git, both of these are non-issues (as I explained previously), and using rebase-and-merge with a single PR and multiple commits can be an excellent alternative to stacked PRs.
The squash and merge strategy would keep the history linear, and would prevent incomplete commits from getting into the main branch. For small PRs it works great, but if things get large, we either end up needing stacked PRs, or we have to accept that the main branch will have some massive commits in it. Neither of which are great options. Massive commits make it hard to use the commit history to help narrow down where a bug is, and stacked PRs require tedious branch management that scales with the number of PRs, making larger projects difficult and time consuming to get approved and merged, and discouraging smaller PRs.

My personal preference is to use squash-and-merge for all small PRs, but allow more experienced git users to use the rebase-and-merge option for larger projects when appropriate, while less experienced users can use stacked PRs with squash-and-merge. The merge-commit PR resolution strategy would then only be used in exceptional cases.

However, I recognize that allowing the use of the rebase-and-merge option could add extra maintenance overhead, and it's only viable if we can expect reviewers to have some familiarity with git, so they can compare changes after a PR is updated using a force-push. GH does a good job of making this as easy as it can by providing a link to the diff, but it gets harder if the reason for the force-push was to address a merge conflict by rebasing onto the tip of main, as this generally requires the use of git range-diff. We'd also need some way of indicating that a PR is going to use the rebase-and-merge option, such as a label on the PR. This would both help the human reviewers, and alter the bot behaviours (e.g. some of the pre-merge testing would need to happen on each commit in the PR).

I'm assuming the guidelines for this are something the IWG will figure out based on what they decide our needs are, then communicate to the rest of the community before we migrate to GH PRs.

vtjnash commented 2 years ago

Which workflows aren’t possible with GitHub Pull Requests?

Github lacks a concept of groups notification of reviewers or filtering by subscriptions, while these are the main highlights of the phabricator homepage under "Active Revisions". On github, I can only attempt to ping people via adding a new comments (which I can't search or filter for myself), and can only add individuals as reviewers (and only up to 10). The closest workaround I've used in other projects is to have separate repos for the issues associated with various components (which doesn't map very well to the monorepo), or to work to aggressively apply labels (which requires giving label-editing permissions, and is hard to scale and enforce).

The other possible workaround is to use a third-party interface to overlay missing features. I haven't tested any of those recently, but might be something the IWG could research.

stacked PRs

One possible workaround for managing stacked PRs is the addition of a developer repo, so complex PRs can first go there, then get merged as a single PR in the main repo. This separates the WIP branches from the official branches. But there are also many downsides to this approach. Github seems to almost already have all of the features necessary to permit making PRs against other PRs, so perhaps that is something they'd be able to add fairly quickly.

mandatory pre-merge CI testing as a bottleneck

If we want complete testing on the merge commits. I've seen these cases handled with something like bors, where you submit PRs to the bot to merge, and it can take more advanced strategies, such as taking an entire batch and then doing the CI testing on the merge of all of them. If there are conflicts or failures, it'll then automatically bisect to remove the offending patch(es) and could still commit the rest, while sending the others back for revision. I don't think this is particularly relevant to the discussion of GitHub vs Phabricator, since it seems like pre-merge CI could be added to either one, and independently of the switch.

reflog access / old branches

Fun fact: Github states that old PR branches are always guaranteed to be available at refs/pulls/N/head on the main repo (readonly, attempting to push to this will fail).

I couldn't find an easy way to access the PR reflog from there, though I clicked back through PRs I'd made several years ago and could still see the force-push message in the discussion log, and still access the content of the old branch in the original form (which I'd deleted years ago, after forcing push more times, and eventually merging). I had the impression from this that, ever since they added the force-push activity log a few years ago, GitHub won't remove any commit once it has been referenced from a PR.

HighCommander4 commented 2 years ago

My personal preference is to use squash-and-merge for all small PRs, but allow more experienced git users to use the rebase-and-merge option for larger projects, while less experienced users can use stacked PRs.

I don't think using a single PR with multiple commits and rebase-and-merge is a sufficient replacement for stacked PRs.

You may want different commits in a stack to have different reviewers, be approved independently, and have their discussions kept separate.

smeenai commented 2 years ago

Phabricator permits attaching comments to unchanged lines. This is frequently useful when pointing out places where changes should have been made but weren't, and is part of the motivation for requesting full-context diffs in Phabricator PRs. GitHub is currently architecturally unable to even represent such comments.

This is a pretty important feature for me.

I'm also concerned about being able to maintain a nice clean linear history with PRs, whereas Phabricator's model of one review == one commit makes that work naturally. I've seen lots of projects using PRs with a history full of merges and fixup commits ("fix typo", "fix compilation", etc. as separate commits instead of a single working commit with all the fixes rolled in). Squash and merge addresses that to an extent, but people would still have to be conscientious about having a proper commit message (instead of the amalgamation of all the individual commit messages that GitHub gives you by default). Most of all, we'd need some way to enforce a clean history and proper commit messages, cos it seems pretty easy to mess up.

ChristianKuehnel commented 2 years ago

@smeenai

Squash and merge addresses that to an extent, but people would still have to be conscientious about having a proper commit message (instead of the amalgamation of all the individual commit messages that GitHub gives you by default). Most of all, we'd need some way to enforce a clean history and proper commit messages, cos it seems pretty easy to mess up.

I suppose this is something we can solve with a bot that actually performs the merge (after reviews and checks passed). E.g. Rust is using Bors for that. So if you implement the merging policy in the bot, you can control what the result looks like...

ghost commented 2 years ago

What features or properties make Phabricator better than GitHub Pull Requests?

Phabricator makes it much easier to keep track of all active comment threads and ensure they're all addressed prior to the change being approved. GitHub pull requests frequently hit situations where comment threads disappear despite not being resolved, and generally GitHub PRs make it challenging to find all the current unresolved comment threads and ensure they've all been resolved satisfactorily.

This has come up a few times, so I will mention that inline comments disappear only in the 'Files changed' tab. They stay around in the 'Conversation' tab, with an 'outdated' marker if the file has since been changed, and a link to the original diff from when the comment was made.

To me, this is a point in GH's favour. When a file has been changed, it usually is because the comment was addressed. The primary use of the 'Files changed' tab is to review changes, and cluttering that view with comments that have likely already been addressed can get in the way of new reviewers reviewing the changes.

If you want to see if/how your comments have been addressed, looking at the current version of the PR is not usually the easiest way of seeing that, so why include them there? Instead, it's usually a lot easier to look at what has changed since you left the comments, and this is most easily done starting at the 'Conversation' tab anyways. GH provides an overview of the commits and/or force-pushes that have happened since the comment was made, and within the overview there are links that will take you to a comparison of before and after (what you probably wanted to see anyways).

Phabricator permits attaching comments to unchanged lines. This is frequently useful when pointing out places where changes should have been made but weren't, and is part of the motivation for requesting full-context diffs in Phabricator PRs. GitHub is currently architecturally unable to even represent such comments.

This is also incorrect. You can absolutely leave comments on unchanged lines. GH does by default collapse the unchanged lines, just like Phabricator does, but after expanding them you can definitely leave a comment on an unchanged line. You can't leave comments on unchanged files as far as I'm aware, but I didn't think you could do that in Phabricator either.

davidchisnall commented 2 years ago

I did a review on Phabricator yesterday for the first time in ages and I missed two features from GitHub (which may exist, I just couldn't get them to work):

Phabricator lets you put comments on lines, with GitHub you can put comments on ranges of lines if a comment relates to more than a single statement.
GitHub provides a ```suggestion block that lets you provide a change that can be merged from the web interface.

vtjnash commented 2 years ago

but after expanding them you can definitely leave a comment on an unchanged line

you can expand it, but GitHub won't pop up the + button to add a comment. This perhaps makes some sense, if you assume that GH is adding comments to the default unified diff, not to the file itself. But I don't really know their rationale, and it is annoying.

zygoloid commented 2 years ago

Phabricator makes it much easier to keep track of all active comment threads and ensure they're all addressed prior to the change being approved. GitHub pull requests frequently hit situations where comment threads disappear despite not being resolved, and generally GitHub PRs make it challenging to find all the current unresolved comment threads and ensure they've all been resolved satisfactorily.

This has come up a few times, so I will mention that inline comments disappear only in the 'Files changed' tab. They stay around in the 'Conversation' tab, with an 'outdated' marker if the file has since been changed, and a link to the original diff from when the comment was made.

I have seen unresolved comment threads disappear from both views. Comments disappear from the "Files changed" tab if GitHub loses track of where they should attach. GitHub is incredibly bad at this compared to other code review tools, so this happens a lot, even if the comment has not been addressed. And unaddressed comments are sometimes collapsed by default in the "Conversation" view. I'm not sure exactly what the criteria are for that to happen, but I think the older comment threads get collapsed when there are more than some number of comment threads in total.

If you want to see if/how your comments have been addressed, looking at the current version of the PR is not usually the easiest way of seeing that, so why include them there? Instead, it's usually a lot easier to look at what has changed since you left the comments, and this is most easily done starting at the 'Conversation' tab anyways. GH provides an overview of the commits and/or force-pushes that have happened since the comment was made, and within the overview there are links that will take you to a comparison of before and after (what you probably wanted to see anyways).

Your preferred code review workflow very clearly doesn't match mine. I want to review the proposed patch -- original code versus proposed new code. The delta since last time I looked is seldom interesting, because that doesn't let me check that the overall change that would be committed is reasonable.

Phabricator permits attaching comments to unchanged lines. [...] GitHub is currently architecturally unable to even represent such comments.

This is also incorrect. You can absolutely leave comments on unchanged lines. GH does by default collapse the unchanged lines, just like Phabricator does, but after expanding them you can definitely leave a comment on an unchanged line.

You're mistaken. You can expand an unchanged region, but GitHub doesn't offer the option to leave comments on it. You don't have to take my word for it, you can go and try it. The little + icon to leave a comment does not appear on lines that aren't within the default context.

The reason, as I understand it, is that GitHub models the location of a comment as a line number in a unified diff. (This is visible in its API. See: https://i.stack.imgur.com/t37he.png) Because of that, GitHub is architecturally unable to support comments anywhere other than on changed lines and on the three lines before and after each changed line -- it has no representation for those locations.

joker-eph commented 2 years ago

Phabricator lets you put comments on lines, with GitHub you can put comments on ranges of lines if a comment relates to more than a single statement.

Phab let's you select a range of lines to comment on, a comment can be even attached to just a word on a line or an arbitrary range of text across lines. (did I misunderstand you maybe?)

GitHub provides a ```suggestion block that lets you provide a change that can be merged from the web interface.

Phab has the same thing (after you select the range), modulo the convenience for the author to apply the diff on their branch in their fork in a click.

To me, this is a point in GH's favour. When a file has been changed, it usually is because the comment was addressed.

I rather have something explicitly marked as resolved, because too often not all comment are addressed. And actually GitHub allows to make conversation as "resolved" right? What's annoying when I use GitHub is the back and forth between the discussion page and the review to match comments with the patch and see what has actually been addressed.

Anyway all of these seems fairly minor to me in the grand scheme of things, this is also the kind of things that can evolve fairly quickly on GitHub side as well. While I wouldn't object to migrate to GitHub PR based on a few ergonomic issues, it'd be great if the list of these could be discussed with GitHub before we move, because it really seems like fixable on their side!

Personally the only single item that would really impact deeply my productivity if we moved "tomorrow" is the lack of Herald rules (notification in general). That is fixable as well, but I would treat it as a blocker.

ghost commented 2 years ago

Phabricator makes it much easier to keep track of all active comment threads and ensure they're all addressed prior to the change being approved. GitHub pull requests frequently hit situations where comment threads disappear despite not being resolved, and generally GitHub PRs make it challenging to find all the current unresolved comment threads and ensure they've all been resolved satisfactorily.

This has come up a few times, so I will mention that inline comments disappear only in the 'Files changed' tab. They stay around in the 'Conversation' tab, with an 'outdated' marker if the file has since been changed, and a link to the original diff from when the comment was made.

I have seen unresolved comment threads disappear from both views. Comments disappear from the "Files changed" tab if GitHub loses track of where they should attach. GitHub is incredibly bad at this compared to other code review tools, so this happens a lot, even if the comment has not been addressed. And unaddressed comments are sometimes collapsed by default in the "Conversation" view. I'm not sure exactly what the criteria are for that to happen, but I think the older comment threads get collapsed when there are more than some number of comment threads in total.

I'm not sure if you're referring to when GH collapses all older activity (like it has started doing in this thread) because there are a lot of items to show. If so, then yes, I suppose that does hide them from the 'Conversation' view. They're not lost, just hidden, which I suppose could be an issue because it's not obvious there is an unaddressed comment. In theory, if it needed to be addressed before merging/landing the PR, your review would have been 'needs changes' which should prevent the review from landing anyways, but I can understand not wanting to rely on that.

If that's not what you're referring to, then I'm really not sure what you are, because I've never seen comments disappear from the 'Conversation' tab. Perhaps the author deleted their comment? Unlike Phabricator, GH doesn't leave any trace of a deleted comment as far as I'm aware.

If you want to see if/how your comments have been addressed, looking at the current version of the PR is not usually the easiest way of seeing that, so why include them there? Instead, it's usually a lot easier to look at what has changed since you left the comments, and this is most easily done starting at the 'Conversation' tab anyways. GH provides an overview of the commits and/or force-pushes that have happened since the comment was made, and within the overview there are links that will take you to a comparison of before and after (what you probably wanted to see anyways).

Your preferred code review workflow very clearly doesn't match mine. I want to review the proposed patch -- original code versus proposed new code. The delta since last time I looked is seldom interesting, because that doesn't let me check that the overall change that would be committed is reasonable.

I usually start with the delta since last time to see if/how my comments have been addressed (and marking them as resolved), then go to look at the 'Files changed' tab to look at the PR as a whole again. I can understand finding switching back and forth to be frustrating, as @joker-eph has mentioned. I don't mind, and find it easier to keep looking at my previous comments separate from re-reviewing, but to each their own.

Phabricator permits attaching comments to unchanged lines. [...] GitHub is currently architecturally unable to even represent such comments.

This is also incorrect. You can absolutely leave comments on unchanged lines. GH does by default collapse the unchanged lines, just like Phabricator does, but after expanding them you can definitely leave a comment on an unchanged line.

You're mistaken. You can expand an unchanged region, but GitHub doesn't offer the option to leave comments on it. You don't have to take my word for it, you can go and try it. The little + icon to leave a comment does not appear on lines that aren't within the default context.

The reason, as I understand it, is that GitHub models the location of a comment as a line number in a unified diff. (This is visible in its API. See: https://i.stack.imgur.com/t37he.png) Because of that, GitHub is architecturally unable to support comments anywhere other than on changed lines and on the three lines before and after each changed line -- it has no representation for those locations.

I stand corrected. My apologies.

vtjnash commented 2 years ago

which I suppose could be an issue because it's not obvious there is an unaddressed comment

FWIW, I've left a review with a large number of line comments (25-50) on GH before, and discovered that this can cause some of the new comments to be hidden entirely by the fold, so that is a risk, though not too common.

python3kgae commented 2 years ago

As a first-time user of Phabricator, this is what I feel confused when use it.

Cannot find a reply button to reply to a global comment.
After review is done, don't know what to do. There's no ui to merge the patch when pull request is approved.
Has to create patch manually. In github, I just push to my own repo, then create pull request to merge into official repo.

jh7370 commented 2 years ago

As a first-time user of Phabricator, this is what I feel confused when use it.

1. Cannot find a reply button to reply to a global comment.

2. After review is done, don't know what to do. There's no ui to merge the patch when pull request is approved.

3. Has to create patch manually.
   In github, I just push to my own repo, then create pull request to merge into official repo.

These are valuable points, since you're a new Phabricator user, so have that perspective. My thoughts on this though:

Phabricator has the ability to reply to a global comment just like in Github, more or less in the exact some location: go to the drop-down arrow in the top-right of a comment and there's a "Quote reply" option. Actually, this is something Phabricator does better than Github: you don't have to switch between discussion view and the commit view to reply to these comments and line-local comments.
That's fair, but perhaps could be solved by better documentation on using Phabricator. However, this is related to my point below.
I don't think it's necessarily an issue to have to push manually, especially as we don't want new contributors to be able to push into the main repository. I think this applies as much for PRs as the existing workflow. However, for approved committers, it is certainly useful to have rebase/squash & merge etc options via the UI, although even then I've tripped over it sometimes not doing quite what I want it to. I prefer the manual control in this sense.

Unrelated, but another issue I have with GitHub is that notifications for replies to comments are done on a per-comment basis (or at least this is how my GHE repo is configured), meaning if I've made a long review with N comments, and the author wants to reply to the comments, I get N notifications, plus potentially one or more about new commits to the PR. In Phabricator, all comments are grouped together and sent in one notification when the person making them hits the submit button (you still get 2 separate ones for "comments" versus diff updates though).

python3kgae commented 2 years ago

As a first-time user of Phabricator, this is what I feel confused when use it.
1. Cannot find a reply button to reply to a global comment.

2. After review is done, don't know what to do. There's no ui to merge the patch when pull request is approved.

3. Has to create patch manually.
   In github, I just push to my own repo, then create pull request to merge into official repo.
These are valuable points, since you're a new Phabricator user, so have that perspective. My thoughts on this though:

Phabricator has the ability to reply to a global comment just like in Github, more or less in the exact some location: go to the drop-down arrow in the top-right of a comment and there's a "Quote reply" option. Actually, this is something Phabricator does better than Github: you don't have to switch between discussion view and the commit view to reply to these comments and line-local comments.

That's fair, but perhaps could be solved by better documentation on using Phabricator. However, this is related to my point below.

I don't think it's necessarily an issue to have to push manually, especially as we don't want new contributors to be able to push into the main repository. I think this applies as much for PRs as the existing workflow. However, for approved committers, it is certainly useful to have rebase/squash & merge etc options via the UI, although even then I've tripped over it sometimes not doing quite what I want it to. I prefer the manual control in this sense.

Unrelated, but another issue I have with GitHub is that notifications for replies to comments are done on a per-comment basis (or at least this is how my GHE repo is configured), meaning if I've made a long review with N comments, and the author wants to reply to the comments, I get N notifications, plus potentially one or more about new commits to the PR. In Phabricator, all comments are grouped together and sent in one notification when the person making them hits the submit button (you still get 2 separate ones for "comments" versus diff updates though).

Thanks for the reply.

For people not approved, a button to ask help for do the merge would be nice.
I don't mean I need to push into main repo directly. I just want to push into my own github repo, then create pull request from my repo to master repo. I feel that is easier than create patch then upload it to Phabricator. I got trouble when create a raw patch from 2 local commits.

whisperity commented 2 years ago

I will reply to these two points right now in one go, because they concern the same issues from my point of view, both as a reviewer and as a contributor.

What features or properties make Phabricator better than GitHub Pull Requests? `and` Which workflows aren’t possible with GitHub Pull Requests?

General ticket management (e.g. reviewers, labels, etc.)

GitHub supports no way of assigning a non-team-member as a reviewer. This is the lesser issue. The bigger issue is that if the contributor (author of the patch) isn't an administrative member of the team associated to the repository, they won't be able to edit any of the metadata of the pull request, be it the people they are requesting the review from, or labels. Even if the administrators take their time to set labels up to the repository, the author cannot assign them.

Discussion threads associated with lines of change

GitHub tries to be too smart about when to hide the discussion associated with a particular hunk in the change. While the discussion itself isn't removed, just hidden (marked as Outdated or Obsolete), this causes friction because the involved parties must manually unfold these entries.

The issue with being too smart is that these changes can be marked outdated the moment the associated hunk changes in a subsequent patch update, even if the change to the hunk has semantically nothing to do with the discussion at hand. (E.g. you've done a refactoring effort which changed the layout of the code, but the algorithmic issues are still not addressed, just associated with the wrong line. This is an issue on Phabricator too, but Phabricator keeps the previously uploaded patches archived, so even though it is a hurdle to get back to "Comment as appears on previous diff", it is possible. With GitHub, if you force-push the source branch of the patch, the previous commits are not properly visible anymore, and there is no going back.)

The need to manually mark comments as Done. on Phabricator has an analogous button on GitHub as well, but Phabricator does not attempt to be smart and automated about hiding things. You as the patch author, have to signal the fact that you've addressed some concerns.

Sidenote about hiding comments

Reading this very issue right now, GitHub has decided that it would hide some number of discussion comments in the middle of the patch. So it shows the first some, and the last some. The decision in Phabricator to hide discussion since your last contribution seems more intuitive. Scrolling the comments on this ticket, it took me a while to notice that people are replying to things that are in the middle, but hidden for me.

Multi-stage patches

Some development effort takes a lot of time and discussion to get fleshed out and committed. Sometimes, patches are too big, so we break patches up to multiple stages. Both me (http://reviews.llvm.org/D69560, 6 patches depending on each other in a line) and some of my colleagues (http://reviews.llvm.org/D32592, a tree of some 10 patches!) have had to develop in such fashion.

I have not read thoroughly the discussion above but I think "stacked PRs" were mentioned already. The problem with doing these sub-reviews and then landing the entire feature in one big patch is that this breaks the ability to bisect properly, all one would see is that "this was good before the big patch and bad after the big patch". Whereas if the feature is landed incrementally, sometimes due to having to run case studies and further discussions, maybe even land it across releases(!), the association and the history is better kept.

The core issue is that on GitHub, the pull request is tied to a branch and thus to a specific commit. You either create many small commits which then will bloat the review (and have to be squashed and rebased before pushing, if we keep our current history model), or you always force-push, in which case the review can derail. There are UI issues, such as the notification about the new commit to the patch bringing you to a diff where you can't see anymore what the difference between the two stages were (something that is visible in Phabricator, you can diff the individual diffs too!). We have run into this issue with another repository unrelated to LLVM, where reviewers specifically asked us never force-push but instead create hundreds of small commits during the review, because the UI can't handle uploading the full changeset of the patch again, in a force-push.

While usually not a concern for the LLVM repository itself, the GHPR interface can also derail review if the target branch is force-pushed. I can't come up with exact references on top of my head for this right now, but I remember having to unfurl some of our product repositories due to this happening.

Phabricator comes out as a winner from my point of view due to the ability of reviewing the patch as-is. With no association to repositories, commits inside the patch, we have the potential to review in isolation, with the state of the rest of the world not interfering on the usability of the interface.

No Herald-like features

@AaronBallman mentioned this, but I have to explicitly second their notion.

Which workflows aren’t possible with GitHub Pull Requests?

Herald-like subscriptions where I can be added as a required reviewer for changes with a file-level granularity. (At least, I've not found the functionality if it exists.)

On Phabricator I have set up rules in Herald to always be aware of works of my colleagues ("Add me as subscriber") and somewhat be aware of projects I want to keep an eye on ("Send me an e-mail"). On GitHub, you either subscribe to a repository, in which you get all notifications related to it, or you have to manually seek out the issues you want to subscribe to, individually. The idea is that if someone does something to a project I keep an eye on, I can manually subscribe if I am interested, or just not do anything if I am not, and then I will not be notified about it.

(Coincidentally, the same project where the aforementioned "Do not force push reviews!" was asked from us by the maintainers, is the one where this issue was made apparent to me. I would like to keep an eye on some of my colleagues' work who would contribute to that project, but I simply just can't, because the technical solutions are not there. Or, rather, not here, on GitHub.)

For a large project like LLVM, this is infeasible. We will have hundreds, if not thousands, of pull requests, that concern me not at all. And doing a manual search every day or week for issues that I am interested in, is just a game-breaking requirement. I would have to either unsubscribe from 99% of the notifications, or wade through 99% of irrelevant patches to find the few I am interested in.

Offloading this responsibility to a clique of people in a project to ensure we always "tag each other" also just causes noise, and introduces a human factor into the pipeline which is better solved by Phabricator's automation.

No RSS feed for issues or pull requests

This concerns the deprecation of BugZilla more than the deprecation of Phabricator, but I would like to mention this too. Together with my colleagues, we have set up a channel in whichever team communication solution we are using that subscribed to the RSS feed of the LLVM related projects we are interested in, namely Clang SA and Clang-Tidy bugs. This is possible, because BugZilla offers RSS feeds.

In another project some of my people are contributing to ("coincidentally" – yet again – the same project I have mentioned two times already, which lives on GitHub), we cannot set up a similar system of being notified about issues we care about in that team communication interface, because there is no RSS feed available.

There is no such similar feature for GitHub issues, unless you are capable of writing something that turns GitHub API results into RSS, for which you also need a server where you need to host this feature, and loads of other potential corporate issues such as firewall rules, cost, etc., appear.

What features or properties make Github Pull Requests better than Phabricator?

Better integration with CI tools, perhaps. But other than that, virtually none at all. Most of the features, such as labelling, tagging, grouping by project, etc., are possible both on GitHub and Phabricator. And making labels, tags, projects, Kanban boards, etc., are both restricted on Phabricator the same way as they are on GitHub: you need to be some sort of an authorised contributor or repository admin to tinker with these details.

rengolin commented 2 years ago

One thing that has been mentioned but not developed here is tools that enhance Github experience to match other tools, like Bugzilla and Phabricator.

Anton K. has been looking at some alternatives for code review and bug-tracking, some of them were ok. Migrating issues to GH is a larger problem than moving code review, because we need the history as well as the future.

With code review, if we had a tool that solved most of the problems with the GH interface and metadata, we could use it instead. Of course, this tool would have to be free to use to all members of our community and not just the admins of the repo.

MaskRay commented 2 years ago

Echo what many folks have mentioned about the downsides of GH. The following three would really decrease my velocity on reviews:

No Herald-like notification (as AaronBallman, jh7370, whisperity mentioned). Path based notification filtering does not exist (only options are: Participating and @mentions, All Activity, Ignore, Custom) This is not only for code owners / regular reviewers, but also for subscribers who don't want to read other components. I would definitely not be able to keep up with "All Pull requests".
Disappeared comments (as jh7370, zygoloid mentioned)
No "Active Revisions" view on Phabricator (as vtjnash mentioned)

Two more points which have not been raised yet.

Squashing and merging from web UI easily leads to a messy commit description like the following where typo|comment lines really should not be included.

Main description

* one line comment

* typo

* typo

If I commit on someone' behalf, I may want to do some adjustment on code styles or commit messages. If I don't have permission force pushing their PR, but push it to the repo directly, I likely need to manually close the PR. The "Closed" state looks worse than the "Merged" state.

jyknight suggested Gerrit and noted that it may be a less popular option. I haven't used Gerrit enough to form an opinion yet.

rengolin commented 2 years ago

Squashing and merging from web UI easily leads to a messy commit description like the following where typo|comment lines really should not be included.

This is an editable field, you're supposed to edit before merging. You can (and should) remove the fixup comments then.

Unless they're from different authors, which then you may want to keep those (to be decided, I have no strong opinion), for the purpose of liability and traceability.

If I commit on someone' behalf, I may want to do some adjustment on code styles or commit messages.

With PRs, we don't need to commit on someone's behalf anymore. Anyone with merge access can just merge and the commit still belongs to the original author. The only reason to be a proxy is if the original author doesn't have (and won't create) a Github account, but then, you're the proxy and have full control of what's in the PR.

You shouldn't change a patch that has already been accepted without informing the author. Adjustments on code style can accidentally change semantics (see clang-format discussion).

If I don't have permission force pushing their PR, but push it to the repo directly, I likely need to manually close the PR. The "Closed" state looks worse than the "Merged" state.

I'd strongly oppose people pushing to my PRs without my explicit consent. Most of the time this is benign and welcome, but sometimes it can be unwelcome and (the internet is huge) malicious.

This is particularly problematic on merge commits that doesn't show who did what, and I appear together with some random person on the whole patch that happens to add malicious code to LLVM. The repercussions on me and my employer (if I'm posting the PR on their behalf) can be disastrous.

Github allows adjustments as comments. You should post those (even after a PR has been approved) and authors are expected to consider and either apply or discard before merging.

jh7370 commented 2 years ago

Squashing and merging from web UI easily leads to a messy commit description like the following where typo|comment lines really should not be included.

This is an editable field, you're supposed to edit before merging. You can (and should) remove the fixup comments then.

Whilst this is true, I know from personal experience that it is very easy to make a mistake with this, and there's no way of reviewing what the end result of your operation is going to be until after it's actually in the official repo, unlike CLI usage (where you can use git log to check what you're pushing before doing git push).

ghost commented 2 years ago

Since there was discussion about it earlier, something I only came across recently: Just like GH, if there are enough comments on a PR, Phabricator will also collapse them with a "There are a very large number of changes, so older changes are hidden." (I'm not talking about the "Changes from before your most recent comment are hidden." thing, this is separate and can be seen here for example: https://reviews.llvm.org/D87940).

No "Active Revisions" view on Phabricator (as vtjnash mentioned)

Also, while I'm here. I'm not sure what people mean by this. From what I can tell, the "Active Revisions" page is no different from the 'Pull Requsts' tab, filtered for a specific author (like this: https://github.com/llvm/llvm-iwg/pulls/ChristianKuehnel). While it's not in your face on the landing page, if you're a new user looking for it, I think it's as easy or easier to find than the "Active Revisions" page on Phabricator. Also, I think GH's UI for searches like these is much easier to understand and use that Phabricator's search function.

vtjnash commented 2 years ago

The comparison was that PH collapses from the beginning, while GH hides from the middle. And GH hides comments from the source view also, if they are hidden from the view.

I track more PRs on PH (where there is a group I am part of) than just those that I have authored. I am not against GH since I use it for most of my work day-to-day, but merely comparing features that the committee may want to evaluate against. I suspect there are 3rd party integration tools that would already mitigate or solve these issues for GH, but that would need to be researched.

llvm-beanz commented 2 years ago

The Phabricator "Active Revisions" page is almost identical to https://github.com/pulls when logged in.

That page has tabs to show your Created, Assigned, Mentioned and Review Requested Pulls.

Edit: And it is across all of GitHub... Which is super handy if you contribute to more than one project, or if you get ping'd on a review for a project that uses LLVM.

fhahn commented 2 years ago

FWIW I share the concerns that @whisperity and @AaronBallman spelled out really well aleardy on the reviewer side.

Another thing I am wondering if there's a way on Github to mark PRs as 'requiring changes' to hide them from the active review queue until the author updates the PR.

rengolin commented 2 years ago

Another thing I am wondering if there's a way on Github to mark PRs as 'requiring changes' to hide them from the active review queue until the author updates the PR.

PRs can be marked as "draft" if the author isn't quite finished (or was requested a major change).

Reviewers can also mark "request changes" which is similar to Phab.

I don't think it automatically hides from the review queue, but depending on configuration, it can halt merging the request until it's finally approved.

nickdesaulniers commented 2 years ago

This project looks exactly what we could use to have Herald like rules from Phabrictor emulated in github pull requests via actions: https://github.com/gagoar/use-herald-action.

hubert-reinterpretcast commented 2 years ago

What features or properties make Github Pull Requests better than Phabricator?

Can retrieve exact "current state" of the proposed change for build/test reliably. With Phabricator, this depends on the patch submitter using arcanist and properly maintaining patch series.
Can apply suggested changes directly from the UI.

What features or properties make Phabricator better than GitHub Pull Requests?

Full history of inline comments remain visible in the code view; this makes it easy to join reviews late and know if a comment was already made for something. Seeing the comments from other reviewers can also help to improve your own understanding, allows you to voice dissent, etc.
Commenting on any view (including older patch states) reliably registers the comment in a way that others can find relatively easily. Comments in certain GitHub views (e.g., commit views) are not reliably attached to the review and can end up "in the void" (this is exacerbated by needing to use more views in GitHub to work around limitations of it's main "Conversation" and "Files changed" views; see further below).
"Expand all" functionality is easy to find and use and makes "grepping" using browser search easy and fairly reliable.
Full file context (if the author posted the patch correctly) for each version of the patch is available as the patch evolves.
Easy to diff between versions of the patch. This can be nigh impossible to do with GitHub when force pushes or merge-from-main is involved.
E-mails include patch content; this works well with having mailing list archives as a system of record.

What new workflows or process improvements will be possible with GitHub Pull Requests?

Should be possible to let new developers commit their own approved PRs without needing to find someone to commit on their behalf.
Could adopt more automation being developed with a GitHub PR focus due to "network effects".

Which workflows aren’t possible with GitHub Pull Requests?

Incremental review by looking at only the delta. As mentioned above, when force pushes or merge-from-main is involved, finding the delta between versions of the proposed change is nigh impossible (without using git). Even when such is not involved, the GitHub workflow will involve views that are mostly good only for viewing (and not commenting); see comment above re: comments "in the void". The extra noise and mental overhead of skipping what one thinks they have already seen is likely to lead to some changes (some of which may be problematic) being missed by reviewers.
Efficiency with browser search to locate relevant code/comments: GitHub collapses many things (blocks of events, lines of code, etc.); it is difficult to get it to expand everything (except sometimes a different view works, but the various GitHub views tend to work for one purpose but not another). Difficultly in discovering that some concerns were already raised leads to wasted effort to communicate the same concerns (I have observed this in practice). The increased likelihood of not finding relevant instances of a search pattern in code leads to a higher possibility of less thorough reviews.
Efficiently following the history of a review. The history is available from the "Conversation" view, which provides only limited context for inline comments. It is easy to miss relevant comments when "just browsing" because even fresh comments can be hard to find, e.g., GitHub places new replies to old threads with the old thread (and does not position the thread in a way that respects that it is still active).
Ease of tracking one's own task list of comments to read/understand/request action on. GitHub does not maintain a personal, persistent state of hiding/collapsing comments and it likes to mark things on its own as "out-of-date", etc. The difficultly means that either extra time is spent or the completeness of reviews (including the author's understanding/addressing of the review comments) can suffer.
Directly suggesting code changes (and general ease of leaving comments on) code in the context further away than a few lines from the changes being proposed.
Working on the review with just a single browser tab: As noted above, GitHub has a proliferation of views and limitations of each view forces the need to open another.

Any other information that you think will help the Board of Directors make the best decision.

It will not be intentional; however, the quality of the GitHub review facility requires sufficient overhead to thorough reviews that it is likely to reduce the quality, efficiency, and participation level of reviews. To elaborate: Reasons for why joining a review is costlier with GitHub PRs have been stated above; the increased cost will reduce the number of reviews that contributors will "opportunistically" participate in. This also increases the chance of having a "silo effect" where reviews proceed with limited diversity of input.

The various rule automation, etc. aspects that GitHub PRs bring to the table can be enabled by using a hybrid model where review happens on Phabricator but committing is done via PRs.

TL;DR: For a consumer (as opposed to a maintainer) of the code review platform, GitHub introduces concrete harms and promises benefits that is not as concrete or realized. It is speculated that adopting GitHub PRs will bring new participation into LLVM; however, there is reason to believe that it comes with a cost of lost contributions from existing community members (@jh7370 wrote: "my ability to review contributions will decrease"). As explained in this write-up, this decrease in ability to review contributions is caused by practical issues.

hubert-reinterpretcast commented 2 years ago

If you want to see if/how your comments have been addressed, looking at the current version of the PR is not usually the easiest way of seeing that, so why include them there? Instead, it's usually a lot easier to look at what has changed since you left the comments, and this is most easily done starting at the 'Conversation' tab anyways. GH provides an overview of the commits and/or force-pushes that have happened since the comment was made, and within the overview there are links that will take you to a comparison of before and after (what you probably wanted to see anyways).

I'd be pleasantly surprised if GitHub provides a meaningful comparison across a force push that includes a rebase from main. Phabricator does a great job of diffing patches that have different "unchanged context".

hubert-reinterpretcast commented 2 years ago

Even if we don't update it though, 30 days seems like it should be enough time for any reviewers or interested parties following a PR to take a look a what changed.

We have reviews spanning months and, in GitHub Enterprise, I have seen a tendency for some authors to miss comments. The inability to find what changed is not the only thing lost if the commit disappears: It is also the context of historic comments. Once a change goes through a few refactors, the point of a comment may still be valid, but the specific way it was expressed may seem nonsensical. This failure of maintaining the semantic integrity is rather problematic. I believe that allowing the work of a reviewer to degrade in this way is disrespecting the time and effort they put in.

david-greene-cb commented 2 years ago

This is very error-prone. The root problem with GitHub PRs is that the target branch cannot be changed after the PR is created.

AFAIK you can. At least I can with my company's repositories.

If it could, then we could create PRs to merge into the next branch in the sequence but then merge them into main in order and reset the next PR to try to merge to main (or merge it into the intermediate branch if reviewers are happy with the roll-up).

That is exactly what happens currently with GitHub. The auto-reset-to-main is a nice feature.

h-vetinari commented 2 years ago

Since it came up a few times as a deficiency of the github UI, people might be interested that github has milestoned the ability to comment outside the immediate vicinity of changed lines for Q1 2022: https://github.com/github/roadmap/issues/347

Since the underlying issue (AFAIU) was that github so far attached comments to the actual lines of the diff somehow (and they'll now be attached to something else), this might also alleviate the "disappearing comments" issue somewhat.

PS. The following might also be a nice improvement for larger PRs: https://github.com/github/roadmap/issues/348

phyBrackets commented 2 years ago

Proposal

The LLVM Foundation Board of Directors is seeking comment on the current state of Code Review within the LLVM Project and its sub-projects. Phabricator is no longer actively maintained and we would like to move away from a self-hosted solution, so our goal is to determine if GitHub Pull Requests are a good alternative to our current code review tool: Phabricator.

Specifically we are looking for feedback on:
* What features or properties make Github Pull Requests better than Phabricator?

* What features or properties  make Phabricator better than GitHub Pull Requests?

* What new workflows or process improvements will be possible with GitHub Pull Requests?

* Which workflows aren’t possible with GitHub Pull Requests?

* Any other information that you think will help the Board of Directors make the best decision.
Where to Direct Feedback

Please provide feedback on this Infrastructure Working Group ticket. This will make it easier to collect and consolidate the responses. At the end of the comment period the Infrastructure Working Group will collect the feedback for further analysis and summarization.

Timeline

The timeline for this RFC will be as follows:
* RFC posted for public review and comment

* 30 days after the date of posting, public comment closes.

* IWG will have 14 days from closure of public comments to review and summarize public comments into a pros and cons list to be present to LLVM Foundation Board

* Foundation Board will have 30 days to make a final decision about using GitHub Pull Requests and then communicate a migration plan to the community.

I think Github will be the best alternative of Phabricator cuz phabricator is not user friendly at all and not much tutorial available for even to learn it and maybe llvm loose some new contributors who want to contribute to it and stuck on phabricator only. Github is quite user friendly and there are lots of contributors here. I don't want to give my opinions but i really want to request llvm to move to github cuz it's really frustrating for new contributors to use phabricator including me.

whisperity commented 2 years ago

@phyBrackets

I think Github will be the best alternative of Phabricator cuz phabricator is not user friendly at all and not much tutorial available for even to learn it and maybe llvm loose some new contributors who want to contribute to it and stuck on phabricator only. Github is quite user friendly and there are lots of contributors here. I don't want to give my opinions but i really want to request llvm to move to github cuz it's really frustrating for new contributors to use phabricator including me.

You are regurgitating genericisms while answering nothing about the specifics asked in the ticket. There is a detailed direct guide as to how to submit patches to Phabricator, available in the official LLVM documentation: https://llvm.org/docs/Phabricator.html#phabricator-reviews -- it tells you step by step what to click on. As for tutorials and learning, most of the learning comes with practice. You may be knowing the process of GitHub because that is what you used more, which is an inherent bias of skill, which happens with all the skills.

(There is also an official documentation from Phabricator's developers at https://secure.phabricator.com/book/phabricator/article/differential/.)

LLVM's guide to using Phabricator is very similar to that of GitHub's documentation about pull requests, available here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request -- although this does not explain what a commit is, or how to upload, and how to fork, you have to look these details up in a separate documentation. What GitHub's documentation has going for it, however, is that it has images, while LLVM's only explains things in text... This is something on which we could surely improve our documentation (i.e. add images) but that is a separate concern from governance and the change of the established processes.

pogo59 commented 2 years ago

TL;DR: The argument cannot be "why we should stay with Phabricator" but must be "what can we use instead of Phabricator."

We can argue the merits of using a joystick versus a wheel to control our vehicle, until we are all blue in the face. The simple fact is that if new people who are looking to contribute are used to using a joystick, arguing that the wheel is technically superior and that we have instructions for using it, does not really address the point.

I wrote a lot of LLVM's detailed instructions for using Phabricator out of frustration with its counter-intuitive interface and, critically, its complete lack of built-in help. (There is no link to LLVM's instructions from Phabricator itself.) Some people who have managed to learn to use it in more sophisticated ways have tried to help me do the same; these efforts have all failed. (Someone tried to help me find reviews where I was a reviewer or subscriber; the net result was something that would let me find reviews where he was a reviewer or subscriber, and no clear way to copy and edit the macro or whatever it was. Why isn't something like this simply built-in functionality? The UI is simply bad.)

Phabricator is un-maintained. I think it's written in PHP so maybe the recent log4j issue doesn't directly affect it, but if an equivalent problem arose with PHP then it would be an emergency with no good recourse. We need to move away from it as a matter of good dependency management for the LLVM project as a whole.

People have spoken a lot about features that Phabricator has, that other review tools do not. That's great! We can look for other tools that support these things, or can agitate to get them added to whatever tool we choose. But IMO we simply cannot stay with Phabricator.

rengolin commented 2 years ago

@pogo59 +10 internet points for the wheel/stick comparison. It's spot on!

What we need is a tool that interfaces well with Github, not necessarily Github PRs directly.

The security / maintenance / deprecation side of it is also quite alarming and will only hurt when it happens, but then it will hurt a lot.

joker-eph commented 2 years ago

The simple fact is that if new people who are looking to contribute are used to using a joystick, arguing that the wheel is technically superior and that we have instructions for using it, does not really address the point.

I don't agree: if people who want to contribute are used to Java, should we start using Java in the codebase? There is a learning curve for any tools and at some point Phab vs GitHub is such a tiny aspect of the project compared to let's say: proper use of FileCheck... I'd also add that for a long time what limited a better integration of Phab in the LLVM ecosystem has been the reluctance to agree that it is a valid way of contributing: there has been a lot of resistance initially to using a web tools in the first place. Many people were very anchored to reviewing through email, even just 5 years ago!

Note that while I don't but this argument above, I agree with the general sentiment: it is less about "should we move", but "when" and "under which condition". From this point of view the following is spot-on to me:

People have spoken a lot about features that Phabricator has, that other review tools do not. That's great! We can look for other tools that support these things, or can agitate to get them added to whatever tool we choose.

pogo59 commented 2 years ago

The simple fact is that if new people who are looking to contribute are used to using a joystick, arguing that the wheel is technically superior and that we have instructions for using it, does not really address the point.

I don't agree: if people who want to contribute are used to Java, should we start using Java in the codebase?

I disagree with the validity of the counter-example. We've changed lots about how the project operates, and I believe in all cases one of the strong motivations has been to lower the barrier to newcomers:

We've changed our IRC to Discord
We've changed our SCM tool to git
We've changed our SCM host to github.org
We've changed our bug management to github issues
We're talking about changing the main communication method from mailing lists to a forum

The middle 3 of those are all about becoming part of the github ecosystem, and moving to github PRs is very consistent with that. I regret I don't have actual data about how common it is for a github-based project to use PRs, if someone has that data on what the more popular review tools are (PRs? Gerrit? Something else?) it would be very helpful I think.

Note that while I don't but this argument above, I agree with the general sentiment: it is less about "should we move", but "when" and "under which condition". From this point of view the following is spot-on to me:

People have spoken a lot about features that Phabricator has, that other review tools do not. That's great! We can look for other tools that support these things, or can agitate to get them added to whatever tool we choose.

Right. I understand that the original proposal was asking only about PRs, but I think evaluating a broader set of tools would be more appropriate, looking at both feature sets (what do we gain/lose in the switch?) and how widely used they are (lowering the barrier to entry).

LebedevRI commented 2 years ago

We've changed our IRC to Discord

That's factually incorrect, and i don't see that happening. (is there an IRC bridge yet? matrix bridge? can i use 3rd party client?)

AaronBallman commented 2 years ago

One point that I've not seen raised (sorry if I missed it) is that switching away from Phabricator means we lose historical information about the code reviews already done in Phabricator.

I've recently been working with folks on some of the relicensing problems and have needed to go into ancient Phab reviews numerous times to give lawyers information related to what's being relicensed. If I had no access to those reviews, I'd have to rely on what made it to the mailing lists. But we know that despite efforts to the contrary, not every review has added cfe-commits or llvm-commits to the subscriber's list. So Phabricator, at least in some instances, has the only review information available for a change.

Whatever service we switch to, I think we still need to consider how we retain this historical information because it is important even if it's not important to everyone in the community. We have some of this information in the form of the blame list for a change, but it's the information contained within the reviews that's often the most critical for people doing this archeology.

(I don't have strong opinions on how to solve this, but I think there are plenty of options to consider. We could migrate old reviews to the new service as best we can, like we did with issues. We could keep a Phab instance running but disable the ability to use it for new reviews. We could dump the information from Phab into some other database. Other options likely exist. Mostly, I think what we should not do is close down the Phab instance as if this historical information wasn't of value to the community.)

pogo59 commented 2 years ago

We've changed our IRC to Discord

That's factually incorrect, and i don't see that happening. (is there an IRC bridge yet? matrix bridge? can i use 3rd party client?)

Ah, sorry. We've added Discord, and according to the invitation page there are over 5000 users who've joined our instance (I'm not one of them, FTR), which naively seems reasonably popular.

joker-eph commented 2 years ago

I disagree with the validity of the counter-example.

How so? (what you follow-up does not address my counter-example).

We've changed lots about how the project operates, and I believe in all cases one of the strong motivations has been to lower the barrier to newcomers:

I don't think this was necessarily the main motivation for most of your examples, at least I've been motivating and supporting all these moves (except GitHub issue) independently of the newcomers aspects. I'm in the camp that this should not be the first priority: existing contributors are more important than newcomers to me. A newcomers does not stay new for long, they get up to speed in O(weeks) and then what matters is how easy/convenient/productive it is to contribute afterward.

What we may agree on is that it would be a hard sell for anything to be proposed that would make it harder for newcomers. Reciprocally I would have strong concerns in a move that would degrade the workflow and the productivity of existing contributors because it "may be more convenient for new contributors".

joker-eph commented 2 years ago

One point that I've not seen raised (sorry if I missed it) is that switching away from Phabricator means we lose historical information about the code reviews already done in Phabricator.

These was touched on before: it should fairly easy to create a static HTML mirror of the pages on Phabricator and preserve the exact state, including the existing URLs in the commit messages.

whisperity commented 2 years ago

* We've changed our bug management to github issues

Which was done in a way that I am still unable to find appropriate measures to subscribe to individual ticket labels or search expressions, something that Bugzilla supported. GitHub offers the API, sure, but the problem with the API is that I would need to have someone at the company host a server that is able to run the API script, or pay for it myself, something that is both very unlikely to happen. With Bugzilla, there was a simple RSS query for which you could set up notifications through a standardised(?) format, which was now taken away. Of course, we could use maybe-shady API-to-RSS mirrors that may or may not work, If LLVM is an important enough project that we

can agitate to get them added to whatever tool we choose

then we should press towards that, instead of side-stepping the solution, both wrt. Herald, and subscription to issues, filtered RSSes, etc.

(Also, thanks for the heads-up that the migration is through, it's time for me to delete the channels for my team with regards to the CSA and Tidy issues, because we are no longer getting an up-to-date feed anyway.)

At least with regards to Phabricator, the issue with subscriptions was mentioned early on and as far as I can tell, people are looking into means of having Herald-like functionality. But the switch to GitHub Issues just goes to show that such decisions can go badly.

@AaronBallman said:

Whatever service we switch to, I think we still need to consider how we retain this historical information because it is important even if it's not important to everyone in the community. We have some of this information in the form of the blame list for a change, but it's the information contained within the reviews that's often the most critical for people doing this archeology.

While the current issue and discussion is about going away from Phab and potentially to GitHub, this raises an interesting point. As this discussion has not yet reached (IIRC) its conclusion, maybe it would be time to do the complete package, and already create some reasonable exit plan or escape plan, should the need arise. Otherwise we will be in the exact same situation 2 or 5 or 8 years later when the next move happens...

@LebedevRI said:

That's factually incorrect, and i don't see that happening. (is there an IRC bridge yet? matrix bridge? can i use 3rd party client?)

From Discord's PoV, nope, they consider it a complete violation of their TOS... which is really sad, because there were plenty of projects which gave nice 3rd-party, even TUI interfaces for Discord, but it is a constant danger to use those.

@pogo59 said:

Ah, sorry. We've added Discord, and according to the invitation page there are over 5000 users who've joined our instance (I'm not one of them, FTR), which naively seems reasonably popular.

I believe this is indeed naive, and not a good metric. Personally, I am on the Discord server. Discord only recently introduced threads in channels (something that was in Slack years ago), but people aren't using it, which makes navigation of it cumbersome. The Discord server also isn't in any way partnered or official in Discord's registry, which might just be nitpicking. I can't check if it is registered in the "Server discovery" thing, but I would wager a negative answer on that too. However, even though I am in the Discord server, and it is in a prominent location in my server list (the 5th, actually), my last talking on the server was in August 2020.

pogo59 commented 2 years ago

I disagree with the validity of the counter-example.

How so? (what you follow-up does not address my counter-example).

Oops left out that part of the follow-up: However long each of those itemized changes took, I believe the resistance to converting our multi-million-LOC code base to a new language would be orders of magnitude higher. Converting to a new review tool is comparable to the other changes listed; converting the implementation language is not.

We've changed lots about how the project operates, and I believe in all cases one of the strong motivations has been to lower the barrier to newcomers:

I don't think this was necessarily the main motivation for most of your examples, at least I've been motivating and supporting all these moves (except GitHub issue) independently of the newcomers aspects. I'm in the camp that this should not be the first priority: existing contributors are more important than newcomers to me. A newcomers does not stay new for long, they get up to speed in O(weeks) and then what matters is how easy/convenient/productive it is to contribute afterward.

What we may agree on is that it would be a hard sell for anything to be proposed that would make it harder for newcomers. Reciprocally I would have strong concerns in a move that would degrade the workflow and the productivity of existing contributors because it "may be more convenient for new contributors".

I agree that raising the barrier to newcomers is a hard blocker and quite possibly a more widely held view than encouraging newcomers. But I personally have been subjected to enough different review processes and tools over the decades that most of the differences people are bringing up are in the noise to me. Even stacking, the only Phab feature that seems at all innovative, I've never done; anytime I've tried to do anything that involved, it has been as a series of individual reviews, and it's worked fine, therefore I'm not persuaded that it's a show-stopper (although definitely something worth agitating for in whatever tool we convert to).

Previous Next

llvm / llvm-iwg

A Request for Comment on Code Review Process #73

Proposal

Where to Direct Feedback

Timeline

What features or properties make Phabricator better than GitHub Pull Requests? and Which workflows aren’t possible with GitHub Pull Requests?

General ticket management (e.g. reviewers, labels, etc.)

Discussion threads associated with lines of change

Sidenote about hiding comments

Multi-stage patches

No Herald-like features

No RSS feed for issues or pull requests

What features or properties make Github Pull Requests better than Phabricator?

Proposal

Where to Direct Feedback

Timeline

What features or properties make Phabricator better than GitHub Pull Requests? `and` Which workflows aren’t possible with GitHub Pull Requests?