spacetelescope / style-guides

An opinionated guide on how we work.
Creative Commons Attribution 4.0 International
55 stars 33 forks source link

Git flow docs #32

Closed arfon closed 4 years ago

arfon commented 6 years ago

It's great that we've got a GitHub guide in here (✨ @bourque), however, I think the fork/PR workflow we're recommending is the wrong one for repositories under the spacetelescope organization.

Reasons I don't like forks:

Instead of what we currently have in the GitHub workflow document, we propose the following:

When STScI staff are working on an STScI-owned project, staff should push feature branches to a shared repository rather than working on personal forks

Reasons I often hear against a feature-branch-on-a-shared-repository proposal:

/ cc @eteq @bourque @stscicrawford (i.e. folks I've already discussed this with at least in part)

JeffValenti commented 6 years ago

I'm out of my expertise here, but I like that folks in INS have to make a pull request, rather than simply pushing code changes. The reason is that having your code reviewed and being asked to review other people's code has pedagogical value, encourages more uniform coding idioms, and facilitates succession planning if we lose people. Perhaps these are not issues in OED, but they are in INS now. Is there some technical middle ground that addresses your concerns and the issue I raised?

arfon commented 6 years ago

I'm out of my expertise here, but I like that folks in INS have to make a pull request, rather than simply pushing code changes. The reason is that having your code reviewed and being asked to review other people's code has pedagogical value, encourages more uniform coding idioms, and facilitates succession planning if we lose people.

Absolutely, apologies if this isn't clear but I'm definitely not proposing that staff don't open pull requests here. They definitely should! What I'm suggesting is that pull requests on spacetelescope owned repositories by staff, should be made between branches on that repository rather than forks (for the usability/collaboration challenges I mention above). It's the difference between:

spacetelescope:master  <--  arfon:my-feature-branch

and

spacetelescope:master  <--  spacetelescope:my-feature-branch

GitHub's documentation about pull requests shows how to achieve this.

To be clear, it's entirely possible that a team might want to use forks even if they have the option of pushing branches to the same repository. My problem with the Git guide as it currently stands is that it suggests fork & pull is the preferred workflow whereas I believe branch & pull should be for Institute-owned projects.

JeffValenti commented 6 years ago

Thanks for the explanation! This sounds fine to me. I didn't fully appreciate the branch/fork distinction. I was confused by your original bolded statement, "staff should push", but you have now clearly stated that the nominal workflow will involve pull requests.

bourque commented 6 years ago

I agree with @arfon that the GitHub guide should not be advocating for a forking workflow as a default, as that is definitely overkill in many situations. Perhaps the best solution is to have two sections in this guide: one that describes a branching workflow, and one that describes the forking/branching workflow? Each section could outline some of the situations in which a team may want to use the particular workflow.

On a personal note -- for the jwql project, we have adopted the forking workflow, mainly because of this:

I don't like cluttering up the main repository with lots of branches...

I wonder at which point this starts becoming a problem. Currently the JWQL team has 9 developlers, and at any given time we have ~5-10 feature branches in development. I think that would be straightforward to manage with branches in the main repo, but at some point we will be welcoming contributions from what could potentially be the entirety of the JWST instrument teams. The thought of having dozens of active developers and feature branches scared me into using forks, but perhaps I am overthinking the situation. I would like to hear anyone's opinion on this if they have one.

pllim commented 6 years ago

Forking workflow is much safer. If y'all work on different branches in the main repository without forks, it only takes one mistake to accidentally delete or overwrite others' branches. While deleted branches could possibly be recovered via local clones (or if you memorized its hash), it is no fun.

Forking workflow also promotes open development, in the sense that someone interested in your project but might not have write access is able to contribute.

mperrin commented 6 years ago

I agree with @bourque's comment above. I think this is a case (like Travis vs. Jenkins?) where we will be better off saying "here are two good options, and here are the pros and cons of each, and different projects can reasonably choose either one."

In particular for projects with substantial numbers of external collaborators, I think it makes good sense to use forks because it gives a consistent paradigm for everyone working on that repo. There can also be a lower barrier to entry since many developers may already be familiar with this workflow from other projects, for instance astropy is developed via fork+PR. But conversely I've also seen git-flow work very effectively in cases where it's more of an internal development team. Two relevant personal examples would be webbpsf (fork+PR) and all the optics lab repos (git flow); I wouldn't want to change either of those to the other approach.

arfon commented 6 years ago

The thought of having dozens of active developers and feature branches scared me into using forks, but perhaps I am overthinking the situation. I would like to hear anyone's opinion on this if they have one.

The http://github.com/github/github repository (i.e. the application we're using right now) had > 1000 branches. The only challenge it created was that occasionally someone had to go and delete 'stale' branches. Note, this is one of the main reasons why GitHub gives you the option to delete a branch after merging a pull request.

Forking workflow is much safer. If y'all work on different branches in the main repository without forks, it only takes one mistake to accidentally delete or overwrite others' branches. While deleted branches could possibly be recovered via local clones (or if you memorized its hash), it is no fun.

I agree it's 'safer' but I stand by my comments above that it's harder.

eteq commented 6 years ago

I firmly disagree with @arfon here (as he probably knows). @pllim said a lot of my reasons in https://github.com/spacetelescope/style-guides/issues/32#issuecomment-415473988 (as did @arfon in the OP). An expansion on this is outlined below.

Two context points first:

  1. At STScI a lot of my "external/internal" arguments apply even across-divisions: I think e.g. INS folks end up behaving as "external contributors" when contributing to or just commenting on DMD-managed repos (and vice versa). So thinking of this as an "inner source" problem is warranted.
  2. Any problem that is corrected by more fine-grained rules as opposed to simple default-able firewalls is dangerous, because it depends on the maintainer to correctly administer them, including understanding the motivation. And we currently have a dearth of experts on this (although of course it'd be great to change that!)

My pro-fork arguments:

Counter-arguments to @arfon's:

Forks are harder to back up

This is true but with reasonably-size PRs it's not much worse than the problem that sometimes we work on code while off-site and haven't plugged into a backup drive. As long as everyone knows that code is only truly "ST" when its merged, that's a good thing.

Additionally, it's a fairly easy task to write a script using the github API that backs up any branch that's part of a PR even if its outside the spacetelescope repo. If that's truly a show-stopper I even volunteer to write said script 🤑.

Also, arguably this is a feature for external contributors: we should not own their work until its merged (although that doesn't apply for fully-internal STScI projects, I admit).

CI isn't enabled by default on forks

I think this is a feature not a bug: that gives the forker the freedom to experiment with altering the CI settings without messing up the main repository. I use this quite frequently, in fact, even for projects where I have total administrative control over the main repo.

It's also not that big of a deal to make a PR with a "WIP" tag and use that to run the CI. In fact that's good as a way for people to have an early chance to comment on complex work. Again this ends up allowing more fine-grained control of exactly when the CI runs.

Forks are a usability nightmare

I see the points here, those are valid. But its a lesser-of-two-evils thing for me.

I do have solutions for the first two though - in fact I argue we should not try to keep forks up to date with upstream. The first thing to do when you fork is replace master with something like "staging" which you use only for e.g. local CI. Also, you should never have collaborators on your fork - instead they should fork. All of this can be automated with the API or local git interactions - see e.g. eteq/sanity-preserving-gh-workflow,

I have no real answer to the notification one. Two cheeky answers: 1. Eh, you'll be in notification hell anyway, that's just life 2. never use private repositories 😝

eteq commented 6 years ago

Re: @bourque's suggestion of "two options": in general I agree with this attitude on these style guides. I could live with this on that basis.

However in this case I think this is dangerous for the institute culture: these are very different contributing workflows, and if both co-exist I don't have confidence that we won't just end up with two conflicting workflows which creates new barriers instead of knocking them down (which I see as the whole point of either github-style PR workflow). What happens when you have the "one repo" workflow and I don't know it and instead fork? Are you going to close my PR and say "you have to join the team or we refuse"? And if you're using that in your team, you probably won't have learned all the things @arfon mentioned above about fork usability problems so I'll have to spend a bunch of time educating you on that before I can get your one-line patch (which is what I actually wanted).

bourque commented 5 years ago

I'm not sure we ever came to a concrete decision about how to modify the git Workflow style guide to incorporate other workflows (or if we even want to do this).

@dpnemergut has some great slides on a release-driven git workflow (i.e. gitflow) from the Advanced git Training Workshop (see attached), and it seems that this workflow is used/would be useful for several groups at ST. Perhaps we should describe this workflow (or at least link to it) in this style guide as a release-driven alternative to the already-described forking workflow.

Advanced Git.pdf

dpnemergut commented 5 years ago

Details about the gitflow model can be found in this post by Vincent Driessen. It can be used together with forking by setting the 'develop' branch to be the default in the main repo. People's forks can be used to create feature branches which are then pull requested to be merged back into the 'develop' branch of the main repo.

If you decided to forgo forking then all branches other than feature branches can be locked down to admins and pull requests must be created to merge code into them.

arfon commented 5 years ago

On reflection branches vs forks is not a hill I'm willing to die on so I'm happy to close this as a wont-fix. That said, if anyone is suitably motivated at some point, I think it might be worth noting somewhere in this guide that alternative workflows exist.

pllim commented 5 years ago

On reflection branches vs forks is not a hill I'm willing to die on

Haha! Wise choice.

bourque commented 4 years ago

With fear of opening up a heated argument again, I would like to (re)request that we modify the git workflow docs to offer two options, a branching workflow and a forking workflow, with appropriate descriptions and pros/cons of each model. I am happy to make those changes.

The reason that I would like to do this (and do it soon) is that @shanosborne and myself are about to demand that INS JWST ancillary tool teams provide (or reference) a documented git workflow as part of our Tier 2 standards (more information about that effort here: https://github.com/spacetelescope/ins-jwst-community-software). It would be great if INS teams could simply point to the git workflow docs here and simply say "we are using the branching model" or "we are using the forking model", instead of creating redundant documentation.

Unless anyone has strong objections to this, I intend to open a PR to this repo within the next few weeks that adds a section for the branching model.