opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.81k stars 1.82k forks source link

[Process Improvement] Feature branch support for more efficient development #15118

Open jainankitk opened 3 months ago

jainankitk commented 3 months ago

Is your feature request related to a problem? Please describe

When multiple people are working on a feature, there can be dependency between multiple PRs and can slow down the development. Some of the options considered:

  1. Having common private fork and raise draft PRs against that. We tend to lose the comments/feedback when new PR created against the main repo. Also, difficult to get maintainer feedback on private fork PR.
  2. Keep merging the changes in private fork and raise big PR on main repo. BIG NO

Describe the solution you'd like

I believe something like below should address this:

  1. Allow creation of feature branch on main repo tracking the main branch. Developer can create github issue and one of the maintainers can evaluate and help with that. There can be few months expiration for each branch to prevent proliferation
  2. Developers can create the dependent PRs on main repo using feature branch as base branch.
  3. Once the dependent-on PR gets merged, base branch in above PR can be updated from feature branch to main branch
  4. Delete the feature branch once dependent PRs are merged

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

jed326 commented 3 months ago

Thanks for the suggestion @jainankitk! I had 2 initial thoughts:

  1. I think you're proposing allowing the creation of feature branches that non-maintainers can push to directly, is that right? Otherwise I don't think there's anything stopping from maintainers from doing this today. I think we will need to check with @opensearch-project/admin if this is even feasible.
  2. It seems like the main problem we want to solve is "if PR B depends on PR A, how can we get started on reviewing PR B before PR A is merged". One way that I've worked around this in the past is cherry-picking the commit(s) for PR A from the fork with the dependency I am waiting on and creating PR B to the main repo including the cherry-picked change. Afterwards, you can ask reviewers to filter the PR by the commits you would like them to review, excluding the cherry-picked dependency. For example, you can view PRs like this: https://github.com/opensearch-project/OpenSearch/pull/15077/files/9b494b6135f00582b437fc2340392b70a7e07080..38b3b950d4da2bde105e355af6f6a5f75c0b3112
dblock commented 3 months ago

We already have feature branches, and there are many: https://github.com/opensearch-project/OpenSearch/branches/all?query=feature, likely abandoned and of need of cleanup. Feel free to codify this in DEVELOPER_GUIDE and/or request a feature branch from one of the maintainers.

That said, there's no way to give r/w access to a feature branch to someone without compromising some security. So a maintainer will need to continue merging changes into the feature branch, not sure that's acceptable.

jainankitk commented 3 months ago

That said, there's no way to give r/w access to a feature branch to someone without compromising some security. So a maintainer will need to continue merging changes into the feature branch, not sure that's acceptable.

I was not aware of the security concerns around write access to non-main branch. Getting maintainers to merge into the feature branch defeats the purpose IMO.

jainankitk commented 3 months ago

I think you're proposing allowing the creation of feature branches that non-maintainers can push to directly, is that right?

@jed326 - You're spot on here!

It seems like the main problem we want to solve is "if PR B depends on PR A, how can we get started on reviewing PR B before PR A is merged". One way that I've worked around this in the past is cherry-picking the commit(s) for PR A from the fork with the dependency I am waiting on and creating PR B to the main repo including the cherry-picked change. Afterwards, you can ask reviewers to filter the PR by the commits you would like them to review, excluding the cherry-picked dependency. For example, you can view PRs like this: https://github.com/opensearch-project/OpenSearch/pull/15077/files/9b494b6135f00582b437fc2340392b70a7e07080..38b3b950d4da2bde105e355af6f6a5f75c0b3112

Thanks for this suggestion, looks interesting. Will try to use this in the short term for some of the PRs, but I believe this starts getting much complex for bigger PRs which are iterating in parallel. Also, the maintainer cannot technically approve the PR as it is still pointing to main as the base branch in case some other maintainer accidentally merges the change.

dblock commented 3 months ago

That said, there's no way to give r/w access to a feature branch to someone without compromising some security. So a maintainer will need to continue merging changes into the feature branch, not sure that's acceptable.

I was not aware of the security concerns around write access to non-main branch. Getting maintainers to merge into the feature branch defeats the purpose IMO.

Exactly. The issue is that you can change CI/CD that runs in target context with access to repo secrets :(

andrross commented 3 months ago

There was a lot of discussion around feature branches on slack a few months ago: https://opensearch.slack.com/archives/C05L60S4UBT/p1715611794391279

My 2 cents: feature branches run counter to the principles of continuous integration and we should be careful to avoid long running development that diverges from main. In my opinion, in most cases collaborative work and code sharing can happen effectively on user forks without needing a feature branch on the main repo.

As to the problem of "if PR B depends on PR A, how can we get started on reviewing PR B before PR A is merged", if they are truly dependent on one another then I'm not sure you can effectively review PR B until decisions in PR A are finalized. If it's possible to review B while A is still in flux then it seems like they are mostly independent you it should be possible to raise PRs independently.

jainankitk commented 3 months ago

My 2 cents: feature branches run counter to the principles of continuous integration and we should be careful to avoid long running development that diverges from main. In my opinion, in most cases collaborative work and code sharing can happen effectively on user forks without needing a feature branch on the main repo.

@andrross - My proposal is not advocating for long running development divergent from main. The feature branch is to allow concurrent development on overlapping pieces without the slow down due to dependencies. We still continue merging to main as and when components are ready.

As to the problem of "if PR B depends on PR A, how can we get started on reviewing PR B before PR A is merged", if they are truly dependent on one another then I'm not sure you can effectively review PR B until decisions in PR A are finalized. If it's possible to review B while A is still in flux then it seems like they are mostly independent you it should be possible to raise PRs independently.

Not necessarily. The dependencies might be loosely coupled allowing the PR review for A and B go independently, but A has to be merged before B can be merged. For example - if I am adding a snapshot service, and someone else is adding createSnapshot API, the createSnapshot API just needs to delegate the creation to SnapshotService. The createSnapshot API logic can be completely independent of how the SnapshotService is implemented.

jainankitk commented 3 months ago

Exactly. The issue is that you can change CI/CD that runs in target context with access to repo secrets :(

@dblock - Ah!, this is constraining. I am wondering if there is a way to restrict the modification access to CI/CD for specific branches?

msfroh commented 3 months ago

Since we wouldn't want to merge everything from a feature branch into main in one shot anyway, I'm not entirely sure what development on a feature branch on the target repo gives you that you couldn't get from collaborating on a user fork.

If you have something big/ambiguous enough that multiple people need to work on different pieces and figure out how they fit together, a user fork lets you get through those "messy" parts. Then you can start opening smaller PRs against main that work toward a solution that you know works. (You can even point to the user fork as an example of what the end product is expected to look like.)

dblock commented 3 months ago

@dblock - Ah!, this is constraining. I am wondering if there is a way to restrict the modification access to CI/CD for specific branches?

AFAIK no. We have a lot of infrastructure that relies on the fact that the code merged is "trusted" to some extent at least. Unfortunately I think this renders the whole discussion moot. I propose we close this.

jainankitk commented 3 months ago

Okay, maybe Feature branch is the wrong term to begin with. Maybe should have called this as PR branch. Take these 2 PRs for example:

Lot of the diff in GET PR is overlapping with the CREATE PR, but cannot be accounted for until CREATE PR is merged in main. Would be much easier to review GET PR if there were branch with changes for CREATE API, which could be used as baseline for GET PR until other CREATE PR is merged.

We don't even need to provide write permissions to non-maintainers for this PR branch, so should not have any security concerns as well. Thoughts @jed326 @dblock @msfroh?

jed326 commented 3 months ago

Without looking too deeply into the logic of the 2 PRs (for purposes of trying to keep the conversation generic), why can't the "overlapping" diff part of the 2 PRs be it's own independent PR first? That should speed up maintainer reviews as well since it would be a smaller change. For example, it looks like the WorkloadManagementPlugin plugin creation related changes could be a separate smaller PR.

andrross commented 3 months ago

I agree with @jed326's comment above.

Also, PRs can be created where the "base ref" is basically anything. If you wanted something like this "PR branch" then you can do it from a user fork without needing to create a new branch on the base repository.

jainankitk commented 3 months ago

If you wanted something like this "PR branch" then you can do it from a user fork without needing to create a new branch on the base repository.

@andrross - Please correct me if I am wrong. We cannot create PR on the base repo using a branch on fork, right? And if we create PR against the fork first, we lose all the feedback on that PR when created against base repo.

jainankitk commented 3 months ago

Without looking too deeply into the logic of the 2 PRs (for purposes of trying to keep the conversation generic), why can't the "overlapping" diff part of the 2 PRs be it's own independent PR first? That should speed up maintainer reviews as well since it would be a smaller change.

That's a fair point. And, one of the reason we broke the 57 file large PR into individual APIs. But at the same time, even small PRs can take a while at times. I am sure life would be much easier if gradle check were more reliable, but in absence of that, it can take days at times to get the PRs merged in. And I would love to start getting feedback from maintainers on other changes meanwhile!

sandeshkr419 commented 3 months ago

And if we create PR against the fork first, we lose all the feedback on that PR when created against base repo.

@jainankitk - Lets say you have a PR on a fork branch and now you want to create the PR against the main branch - we can always tag the old PR with the discussion and mention in key discussion threads on the new PR to be merged into main.

Lot of the diff in GET PR is overlapping with the CREATE PR, but cannot be accounted for until CREATE PR is merged in main. Would be much easier to review GET PR if there were branch with changes for CREATE API, which could be used as baseline for GET PR until other CREATE PR is merged.

Having a feature/pr branch would again lead you to the same blocker - if your changes are blocked on a different PR - you will still have to wait for the dependent PR to merge - you can't assume code will stay same on a dependent PR by the time it gets iterated and merged.

Having create multiple draft PRs till the final dependent changes are merged looks noise sometimes, but I don't see a way in which feature/pr branches will solve the problems at hand. Plus, if having a feature branch is still desirable, its better to host it on a private fork to avoid clutter on main fork.

rishabhmaurya commented 3 months ago

1) fork FeatureBranch <- PR A <- PR B vs 2) main FeatureBranch <- PR A <- PR B

downside of 1 is we lose comments made on PR A and PR B on merge to main. Is it possible that we add links of PR A and PR B (to preserve those comments) in commit message when fork FeatureBranch is merged to main? We can still do it manually, but can we automate it? In this approach PR links might get invalidated if fork FB is deleted.

jed326 commented 3 months ago

That's a fair point. And, one of the reason we broke the 57 file large PR into individual APIs. But at the same time, even small PRs can take a while at times. I am sure life would be much easier if gradle check were more reliable, but in absence of that, it can take days at times to get the PRs merged in. And I would love to start getting feedback from maintainers on other changes meanwhile!

I do think this is a bit of a trap though, in that trying to reduce the number of PRs (and therefore number of PR reviews), most likely won't end up speeding up your development because it's going to take more time to review each PR and, at least anecdotally, I think reviewing a 20 file PR takes more than twice as long as a 10 file PR. This seems like a situation where investing more time up front thinking about how to split up the changes into smaller pieces would ultimately speed make development faster in aggregate.

From a purely numbers perspective, there are probably more maintainers who are both comfortable and available to review a small transport layer change vs. maintainers who are comfortable and available to review a PR that includes plugin bootstrapping and a transport layer change and a service layer change.

jainankitk commented 3 months ago

I do think this is a bit of a trap though, in that trying to reduce the number of PRs (and therefore number of PR reviews), most likely won't end up speeding up your development because it's going to take more time to review each PR and, at least anecdotally, I think reviewing a 20 file PR takes more than twice as long as a 10 file PR. This seems like a situation where investing more time up front thinking about how to split up the changes into smaller pieces would ultimately speed make development faster in aggregate.

And I have been advocating for the same, one of the primary reasons I insisted on breaking up large PR into smaller ones. My suggestion is not to reduce the number of PRs for overcoming delays, but to provide workaround using PR branch for unblocking developers during longer review/merge cycles.

From a purely numbers perspective, there are probably more maintainers who are both comfortable and available to review a small transport layer change vs. maintainers who are comfortable and available to review a PR that includes plugin bootstrapping and a transport layer change and a service layer change.

Completely agree, and is true in general. More components PR is touching, less people can review it (intersection set of people with knowledge of those components)

jainankitk commented 3 months ago

All being said, I believe the problem should mostly be alleviated by growing the maintainer community, and fixing flaky tests allowing us to iterate much faster. But until that happens, it is bit of a problem IMO.

andrross commented 3 months ago

Please correct me if I am wrong. We cannot create PR on the base repo using a branch on fork, right? And if we create PR against the fork first, we lose all the feedback on that PR when created against base repo.

@jainankitk I mean, you don't lose the feedback, but you do have to create a new PR. Wouldn't you have this same problem with a hypothetical "PR branch"? Can you change the base branch that a PR is trying to merge to without closing and opening a new PR?

jainankitk commented 3 months ago

@jainankitk I mean, you don't lose the feedback, but you do have to create a new PR. Wouldn't you have this same problem with a hypothetical "PR branch"? Can you change the base branch that a PR is trying to merge to without closing and opening a new PR?

I was wondering about the same earlier and indeed it can be done - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/changing-the-base-branch-of-a-pull-request. Although, not sure if you need to specifically enable this functionality for PRs on a repo.

dblock commented 2 months ago

[Catch All Triage - 1, 2, 3, 4, 5]