Choosing a Workflow

hallister commented 11 years ago

NOTE
This is a decision that needs to be made AFTER the Git tutorial. 
Don't bother with this  until you are fairly comfortable with Git and 
Github.

This section is likely to appear overwhelming. However, a single workflow is extremely important for a project with this team size. All workflows have their pros and cons, and is largely going to be choosing the one that "makes the most sense" to you.

Anyone interested in a particular work-flow that is willing to do a
write up on it,  please add a comment with your write-up below.

hallister commented 11 years ago

Integration Manager Workflow

The idea behind this workflow is that the primary repository should be blessed: only code that has passed testing and peer-review should make it in.

Below I'll explain a specific type of IMW using existing tools like Github and TravisCI. There are two examples listed below. The "Developer" example is what you as a developer will be doing as a part of this workflow. The "Merge Master" example is what a code reviewer will do. If you aren't interested in the "behind the scenes" you can skip over the Example (Merge Master) section.

Initial Steps

IMW has a couple of inital steps:

Fork the base repository on Github.
Clone your Github fork on your local machine.
Add the base repository as a remote.

$ git clone your_github_fork_url directory_name $ cd directory_name $ git remote add upstream your_github_fork_url

The above steps are required once per machine you plan to develop on.

Example (Developer)

Let's assume you've made some changes to the code, you've written the tests for the code and are ready to submit the code to the base repository. Let's assume your current branch is "widgets". Here are the steps:

$ git add .
$ git commit -m "feature: made the widget move in 9 dimensions"
$ git push -u origin widgets
Open a Pull Request

Example (Merge Master)

Once your Pull Request is complete a few magical things will happen in the background:

TravisCI a continuing integration server will pull the code from your fork and run a variety of tests on it, including:

Code Quality Verification - Did you play by the rules of PEP8?
Project Layout Verification - Is stuff where it is supposed to be?
Unit Test Runthrough - Run all of the tests to ensure they pass successfully.

Once this process is complete, TravisCI updates your Pull Request with the status (Pass/Fail). Once this is done a Merge Master (This could be the entire team, or a few permanent people, or a rotating position etc), will review the code for the following:

Were unit tests written?
- Are additional tests needed?
- Are all tests required (ie: no tests were created that will always pass)?
- Did the tests pass?
Did the submission pass PEP8 tests?
Is the code well documented?

If the Merge Master runs into problems they can post them to the Pull Request. If not, the Merge Master merges (or cherry-picks if neccesary) the changes into the base repository branch and the merge is complete.

What happens if it passes all the tests and still breaks something?

If a breaking change is introduced, and that change is introduced before anyone else commits code, we can simply reset the master to the commit prior to the breaking one. If it isn't noticed, it is fixed as any bug would be.

Pros and Cons

All workflows have some pros and cons. and Integration Manager Workflow is not different.

Pros

Simple: It's one of the "easiest" workflows for developers:
- Notifications are automated (Github).
- Shifts Git intricacies to those familiar with Git (Merge Masters).
Central: Because all team members have a public repository, anybody can pull from anyone else.
Review First: Nothing makes it into the codebase until it has been reviewed and tested.

Cons

Multiple Tools: Pull requests can't be done from the command-line, so developers will have to go to Github as a part of the workflow. Done incorrectly it can be "slow".
Added Responsibilities: Merge Masters will have the extra responsibility of reviewing code.
Fetching: In order to keep your forked Github branch in sync with the main branch, you'll need to fetch the upstream master and sync it. I have a git alias that makes this a simple git resync upstream/master.

andyleejordan commented 11 years ago

I'm going to respond line-by-line from the top here. Justin and I are advocating a very similar model, mostly because we're both basing off A successful Git branching model.

The idea behind this workflow is that the primary repository should be blessed: only code that has passed testing and peer-review should make it in.

This is a good idea, but is very constrained. The primary branch i.e. master should be the master code: code that is passing all unit tests and has been reviewed. Thanks to git branches, this does not require and should not lead to a master repo, as non-integrated code will exist on user and feature branches.

Initial Steps

When you clone a repo using git, you are making a fork. This does not, and should not, be a Github-style fork (that is, hosted on your account without write-access for anyone else).

Clone the group's repository on your local machine: this is your local fork. Checkout a user branch or feature branch to work on. Do your work, push up your branches. You haven't touched master (and even if you accidentally do, this is git, so it's versioned and always fixable), yet your code is available immediately for all other team members to look at and work on.

If we end up having to track individuals' or groups' repositories, merging becomes a nightmare, and it becomes O(n) more difficult to track the additional remote repositories. This problem is easily solved by not giving in to Github's "pull request" 'feature', which itself is not intrinsic to git the DVCS.

Example (Developer)

git branch me
git checkout me
(make some changes)
git add .
git commit -m "Adding some changes on my or a feature branch"
git push -u origin me

No need for a pull request! All other members have access to your code immediately by checking out your branch.

Also, every time you push (to whichever branches we choose, it's highly configurable, and I own the server already), unit tests, syntax checking, PEP8 validation, package building will automagically be run, and notifications sent out to the appropriate users.

Example (Merge Master) Since we didn't need to do any sort of pull request, when you have a feature working and think it should be reviewed and integrated, tag it!

git tag -a "Hey, I think this feature branch is good at this point"

Now make a Git issue (that'd be just like a pull request, without having to use it), reference your tag commit, and the merge master can now pull the repo, on his/her local machine:

git checkout integration
git merge master
git merge 'tagged commit'
nosetests

If everything passes and has been reviewed and looks good:

git checkout master
git merge integration
git tag -a "increase version [semver](http://semver.org/) number for new master release"
git push origin master

Also remember, the testing really gets done automatically, so there's even less for the merge master to do.

Pros and Cons

So the workflow Justin outlined is essentially what we want, but with a very important difference: no dependence and slow-down and hassle of many multiple remotes to track and pull requests to deal with. Don't get me wrong, Github's pull requests are fantastic for OSS community development, where only one person owns the upstream project. But we all own this project, and are working collaboratively, hence collaborative access to our codebase.

Why Full Push Access

Pros

All code written be developers is immediately available to all other developers by a simple git pull.
Pull requests don't have to be touched.
Only one remote needed.
Accidental work on master can be reverted, this is git after all.

Cons

Users still not familiar to git may accidentally push changes to master, but this is git, that's easily fixable
It requires users to understand a branch model; however, this is going to be necessary regardless.

hallister commented 11 years ago

I think there have may been some confusion here. The model I mentioned does not mean that some do or do not have merge access. Merge masters can be everyone, can be limited to certain people in each sub-team, or even a single person (bad idea for this particular project). It's really up to the team... more than likely everyone will have Push access, but we can still use the fork model.

There's advantages and disadvantages to both models:

Fork

:thumbsup: Nothing makes it into the main organizational branch until it has been peer-reviewed.
:thumbsup: Testing (via TravisCI) happens automatically when a PR is opened.
:thumbsdown: Access to the branches of the other teams requires adding forks as remotes (Avoidable by merging frequently).
:thumbsdown: It adds the extra steps of opening a PR and merging it from Github.

Branch

:thumbsup: Everyone on the team has immediate access to everyone else's code.
:thumbsup: Entire workflow can be done via command line.
:thumbsdown: Requires very specific branch organization or becomes chaotic and messy.
:thumbsdown: Testing requires a separate server which runs after the commit.

To be clear, both models are used by huge companies. Google, for example, uses both models depending on which internal team your on. So it comes down to preference. Frankly, this is more than likely just an "experience" issue... I've worked far more on the Fork model, while Andrew works on the Branch model (at SEL).

Regardless I think the issues I brought up above nails the biggest pros/cons to both.

andyleejordan commented 11 years ago

I just want to note that both models include forks and branches, it's just where those forks exists and if everyone can push to the origin repo.

As for nothing making it into master until it's been peer-reviewed, this is true in both cases. Testing automatically via Jenkins (or TravisCI) works automatically for both cases. Branches are going to need to be organized regardless, and it's easier to just have one repo with all the branches. For both cases testing requires a separate server which runs after the push (not the commit) regardless of TravisCI or Jenkins or PR or branch.

hallister commented 11 years ago

Alright folks, we need to hear from more than just the two of us. @ui-cs383 what do you think?

andyleejordan commented 11 years ago

The only real advantage I see with PR is that it becomes almost impossible to mess up the master branch, but I think that problem is already mitigated with the explanation earlier of don't work on master, and that's it git, so it's easily fixed. The major disadvantage to it I see is the lack of immediate sharing of code, which I've seen become a large hindrance.

So especially considering many members' inexperience with Python, it would be beneficial where they have a problem, commit to their user branch, make an issue asking for help referencing that commit, and any of us can pull and help out without any hassle.

Just my summary of advocating push rights.

hallister commented 11 years ago

@andschwa I realize you are an advocate of the branching model, but let's not make this a bigger deal than it is:

git add remote andschwa ssh://git@github.com:ui-cs383/fork-name.git

Now I can pull from your fork for the rest of the semester.

That said, if this is really a huge deal, I'd be happy to make you or anyone that wants it a git alias with every members fork in it, which names their remote their vandal user id. A simple:

git ui-383-remotes

Would add everyone's fork as a remote.

You've mentioned several times that using the PR method ruins collaboration, it's a large hindrance, etc. I think we need to take care not to exaggerate the difficulties or the benefits of any one thing and let people choose on their merits.

Regardless I REALLY think we need to hear from the group on this instead of making it a point, counter-point between the two of us.

andyleejordan commented 11 years ago

Yeah we really do need to hear from others.

But as far as counterpoints go, that now means tracking every other person's fork (so 16 of them) if you want to be able to see the latest work being done. With that also means many, many more branches/refs to track, as everyone will have their own version of master/integration/whatever other branches are upstream, plus their own. I've done it before, but it's a headache that doesn't need to exist.

Cheers,

Andrew Schwartzmeyer

On Sep 13, 2013, at 7:28 AM, Justin Hall notifications@github.com wrote:

@andschwa I realize you are an advocate of the branching model, but let's not make this a bigger deal than it is:

git add remote andschwa ssh://git@github.com:ui-cs383/fork-name.git

Now I can pull from your fork for the rest of the semester.

That said, if this is really a huge deal, I'd be happy to make you or anyone that wants it a git alias with every members fork in it, which names their remote their vandal user id. A simple:

git ui-383-remotes

Would add everyone's fork as a remote.

You've mentioned several times that using the PR method ruins collaboration, it's a large hindrance, etc. I think we need to take care not to exaggerate the difficulties or the benefits of any one thing and let people choose on their merits.

Regardless I REALLY think we need to hear from the group on this instead of making it a point, counter-point between the two of us.

— Reply to this email directly or view it on GitHub.

harr6176 commented 11 years ago

The Integration Manager Workflow sounds like a wise decision; inexperienced git users don't have to worry about potentially causing harm to the code base, and there's less of a learning curve.

bcumber commented 11 years ago

I'm also leaning toward the Integration Manager Work flow. Mainly because I am the inexperienced git user that could potentially mess up the project. It seems to me that we could subset the full push access work flow within the integration manager work flow(if that makes any sense), so everyone wins. Also since I believe we decided to go with a hierarchical structure for managing this project this work flow seems to suit that setup. +1 for Integration Manager work flow.

andyleejordan commented 11 years ago

As I pointed out above, since the code-based is versioned by git, if it were to get 'messed up', it's easily fixable. Avoiding messing it up is as easy as simply not committing on the master branch.

However, if we do get the project organized right, a few shared forks using Pull Requests may work okay.

cawaltrip commented 11 years ago

I'm going to also have to say that I feel like the Integration Manager/Pull Request model is the way to go with this. Though I like both models, I feel like the less we have to complicate this for people who aren't comfortable with Git the better. We have very little time for this project as is and so the simpler the better.

That said, I think that the sub-teams do things like code reviews internally as well on top of the code reviews that occur when a merge master handles a Pull Request, but that's not part of the issue at hand, so if we're voting, I'm voting on the Pull Request/TravisCI model.

andyleejordan commented 11 years ago

@cawaltrip That's actually exactly why I'd advocate against the PR model, simply because we have so little time to do this. PRs are a system on top of git, made by Github, and add complication to the process.

Look if you guys want to deal with Pull Requests, be my guests, but I'm telling you form experience that it's slow and frustrating with our type of collaborative project.

TravisCI/Jenkins is a separate issue, both will work with both workflows.

hallister commented 11 years ago

@andschwa I realize this isn't the fastest way to get code out, there's no doubt about that. But if this model were honestly THAT bad, nobody would be using it. But some of the largest organizations on Github use this very method.

Is it the fastest workflow around? Nope. Is it slow and frustrating? Nope. It's a good balance between speed and avoiding bugs. That's exactly what we need.

andyleejordan commented 11 years ago

The largest organizations in the world use it for development of open source projects that aren't limited to 16 (now 15) weeks of development. Additionally, those projects were generally developed at least into their first iteration internally, before being open-sourced. They're also managed in a far different matter.

We can try using Pull Requests, but I honestly expect to lead to many frustrations. I also really don't see how it 'avoids bugs' at all.

hallister commented 11 years ago

@andschwa Well, I don't agree with you but fair enough. And how it "avoids bugs" is by ensuring that nothing can enter the repository (not just branch) without passing both tests and a code review. Your way, of tagging releases and merging from branch to master requires that code reviewers have a pretty good handle on git, since the entire code-review process would be via a third party tool or the command line. Expecting people to do more advanced things with tools they don't know is inviting more bugs into the software.

thom5468 commented 11 years ago

I'm going to err on the side of less "power to the people" and say the integration workflow manager is my vote.

andyleejordan commented 11 years ago

@hall5714 Just beware that with that comes the steep price of not having just one repository to work with. If you're trying to avoid bugs, a fork setup as is being advocated here is far more likely to cause merge issues, since each group's repository is going to need to managed individually and kept in sync with each other.

I do expect people to get themselves familiar with the standard development tools for the software engineering process, which is what this class/project is entirely about. These are not advanced things, they are what will need to be learned.

The code-review process now is still going to be a third party tool: Github. Which I'm advocating using regardless of the IWM or One Repository setup. For either workflow, I'm envisioning the Github code review process to be as follows:

create Github issue for needed code review with tag to branch/commit of the code needing to be reviewed
reviewer checks out branch, goes through it, making corrections as needed or at least pointing out what needs to be fixed, commit this
comment on issue with summary and reference the commit(s)

This amounts to the same thing as the PR process. Is there a simpler way that I'm missing?

hallister commented 11 years ago

With the IMW fork model, here's the process:

Developer opens PR request, the tests run and show as pass fail.
The reviewer looks at the code on Github to ensure that tests were written, and pointing them out if not.
Merge and reference commits.

The important part of this is that the Merge Master never had to leave the screen he/she was on. Making corrections isn't the responsibility of the Merge Master. Their responsibility is to ensure tests are written, tests pass.

andyleejordan commented 11 years ago

That's not much of a code review. Tests should have been automatically run before a code review is even started, and then the reviewer inspects the code to catch bugs, ensure PEP8 is being followed (correcting minor mistakes as need be). They can do this from Github or their editor (personally I prefer reading my code in my editor than the browser, hence my dislike for CodeCollaborator). Is the 'merge master' separate from the 'code reviewer'?

hallister commented 11 years ago

The tests will handle PEP8 compliance. Merge Masters are a peer-review, but they are not an in-depth code review. How each sub-team decides to do code reviews is up to them.

andyleejordan commented 11 years ago

Regardless of how in-depth the code reviews are, I have a workflow solution that's a compromise and solves all our problems.

I just discovered this myself, doing some research, but there's such a thing as the Shared Repository Model for Github.

The Shared Repository Model is more prevalent with small teams and organizations collaborating on private projects. Everyone is granted push access to a single shared repository and topic branches are used to isolate changes.

Pull requests are especially useful in the Fork & Pull Model because they provide a way to notify project maintainers about changes in your fork. However, they're also useful in the Shared Repository Model where they're used to initiate code review and general discussion about a set of changes before being merged into a mainline branch.

We can have our Pull Requests and eat it too (i.e. share one codebase, not fracture it) :D

hallister commented 11 years ago

@andschwa Well that still has the same problems that several people have mentioned they don't like: everyone can overwrite everything at the base repository (and I know this doesn't change their local repository, but it does for the central). And I feel like we just need to get people in here to make a decision rather than introduce another model and rehash the conversation again.

That's just me though.

andyleejordan commented 11 years ago

@hall5714 "Everyone can overwrite everything at the base repository" is an exaggeration, git doesn't allow weird pushes by default (for that reason). The worst that could happen is an accidental commit (or few) to master (or any other branch), which is fixed with a revert, and isn't a big deal. It can seem like a bigger mistake than it is. Overwriting history just plain doesn't happen very easily.

This model is a hybrid that gives us the best of both worlds. I think a compromise should definitely be open for discussion.

hallister commented 11 years ago

Per last meeting we are going with the PR model and sub-teams will use something akin to a branch model (unless they decide against it). Closing.

ui-cs383 / Freedom-Galaxy