git version 2.31.1 - results in needing to define mergine/rebase on git pull

Charlie-George commented 3 years ago

git pull using version 2.31.1 results in needed to define the behavoir you want

See https://stackoverflow.com/questions/62653114/how-to-deal-with-this-git-warning-pulling-without-specifying-how-to-reconcile

Does setting git config pull.rebase false to maintain the default behaviour seem the most appropriate option @kevinrue and does this get mentioned/congifured in OBDS course ? Thanks

kevinrue commented 3 years ago

divergent branches is what bugs me more here.

that means that there are two conflicting histories for this branch, which is not a good state to be in in the first place, and thus you should work in a way to avoid this warning.

It seems like you've made some commits locally and some commits remotely on the same branch, which is the most common way to end up in this situation, since the local and remote branches have now different histories since the last time they were synchronised (i.e. pulled/pushed).

If it's a repository where you work alone, then the merge (default strategy) will be fine; it will create a new commit that merges the two histories (you might need to fix conflicts before it lets you commit the merge).

Again, ideally, you really want to avoid this kind of situation. On each computer where you work in a given repository (consider GitHub another computer), you should really make sure that you pull the latest state of the branch before you make any new commit. Even if you work on a separate branch, you never know if someone (including you) pushed a new commit to that branch somewhere else.

kevinrue commented 3 years ago

Oh and no, I won't cover this setting in the course, unless we somehow run into the issue.

I've planned the day to take them through best practices, which should avoid this particular issue.

Git can be confusing enough to new users that there is no need to throw them curve balls, forcing them through issues before they're even comfortable dealing with the ideal scenario.

I feel that we've been doing that mistake with past cohorts. We knew we were walking them into conflicts because of the way we planned the exercise, and it just creates so much confusion for them (thinking it their fault that things went sideways) and us (reassuring everyone and convincing them that we intended the issue to arise).

This time, I've planned a simpler path through best practices, to see what issues still arise that way, but I do have spare slides at the end of the presentation to discuss more advanced points if we finish early and if we all have any brainpower left by then :)

Charlie-George commented 3 years ago

Thanks Kevin. Obvs I didn’t mean for my branches to diverge - but sometimes mistakes happen and I imagine it might be something easily done by a trainee and that we are probably likely to run into on the course. I just thought it might be worth us thinking about now what their config should be set to for the shared repo in that occurrence so we are prepared and consistent? Especially as you won’t be in the sessions when they are really running into errors with in over the first couple of weeks. I don’t want to tell them anything you’d disagree with.

Thanks for the comments on pulling then committing - but I’m a bit confused are you saying always pull before committing?

The official git docs say to always commit first `running git pull with uncommitted changes is discouraged: while possible, it leaves you in a state that may be hard to back out of in the case of a conflict.

If any of the remote changes overlap with local uncommitted changes, the merge will be automatically canceled and the work tree untouched. It is generally best to get any local changes in working order before pulling or stash them away with git-stash[1]https://git-scm.com/docs/git-stash.

Is this outdated thinking and only apply to older versions?

kevinrue commented 3 years ago

You got me there: I mixed up commit and push.

I meant that I always pull before I push (not before I commit). Indeed pulling before you commit can bring new code that is not meant to go in your own commit. Just like you reported, usually Git will actually block you from pulling until you've commited or stashed your own changes. Then you pull, and then you deal with the potential conflicts and fix them in a new commit.

That said, I have pushed my lesson plan / solution / walkthrough for the git lesson in the syllabus (https://github.com/OBDS-Training/OBDS_Syllabus/blob/main/programme/week2/version_control_using_git_github/solution.md), as it should be done for all other days. The whole point of doing that is to give a heads up to all other instructors what the plan is, basically, I will encourage participants to follow this cycle every day:

Update their clone with the latest changes of the main branch (especially as the instructor is supposed tho create and push a new subdirectory for the lesson of the day):
- git checkout master
- git pull master
Initialise a new branch each
- git checkout -b kevin
Store their code for the exercises in the shared repository
Commit their files (on their branch)
- git status (to check they're on their branch and whether there are any large files to .gitignore)
- git add .
- git commit -m "..."
Push their branch to github
- git push -u origin kevin
Open a pull request
Have their pull request reviewed and approved by another participants (get them used to read other people's code, not in details, just briefly to check there's nothing obviously wrong with the files)
Merge the PR
Delete the branch on GitHub
Update their clone
- git checkout master
- git pull
- git fetch --prune (to delete local pointers to the remote branches)
- git branch -D kevin (to delete the local copy of the branch that was just merged)

This is already the detailed procedure for an ideal case scenario where they learn to follow best practice. It won't sink in until they go through that process a few times over a few days. I have tested this process using the latest Git on the CCB cluster and it doesn't need the configuration setting that you mention.

I would think twice before confusing them with configuration settings that they don't immediately need, as those settings are completely abstract to them until they actually run into the issue. Think about the Carpentries lessons when topics are only introduced when the time is right, to avoid overloading participants with unnecessary information so that they can focus on the point that you're actually trying to make at that particular moment.

Also, I would also think twice about configuring settings like those, which are actually just a shorthands for options that you can specify on the command line when you call git pull. The configuration will change the default of git pull silently, while you could reach the same behaviour by explicitly passing a flag to git pull, possibly a different flag depending on the situation that you're dealing with. That sounds safer to me.

Anyway, while fixing their Git rookie mistakes is always part of the course, merging divergent histories is only one of the solutions, and by far not my favorite because of the messy git history that ensues. I would rather prefer to fix things in a way that brings them back into the best practices flow above, which depends on the situation but is often:

"You forgot to checkout a new branch before commiting? It's alright, use git reset --soft COMMIT to undo your latest commits until the last point in common with the github repo while keeping your files as they are. Then git checkout -b your_branch, then git commit -m "new commit message" then git push -u origin your_branch and so on.

I don't mind helping out to fix things when I can and writing an FAQ based on real uses cases as well. I just need enough information to describe the situation to do that.

kevinrue commented 3 years ago

One more thought about this particular configuration setting: it basically 'hides' the problem and kicks the can down the road rather than train people to do avoid it in the first place.

Typically that happens when they forget to checkout a new branch, and multiple participants try to push their recent work to the master branch. The first one will go through, while the following ones with run into the issue because the GitHub master has now branch diverged from theirs. Again, this can easily be mitigated by taking the habit of making them update and create a new branch first thing in the morning, before they even start writing any code for the exercises. Alternatively, git reset --soft COMMIT is the second choice to fix things later in the day.

Point is, no matter how much information we give them during the Git lesson, and how much they think they understood it, some of them will always run into issues during subsequent days (hell, I still run into issues myself sometimes when I don't pay attention).

If we configure this setting, it will just make it easier for them to ditch best practices, and keep creating and merging conflict on the master branch. It "works" in the meaning that the new code from each participant gets combined, but it makes the Git history challenging to read and fix down the road, e.g. when last time a participant accidentally wiped out all the Python scripts when they merged one of their R scripts.

Pull requests are essential here, as it gives one more chance to review what's about to happen and pull the brakes before all hell breaks lose.

Charlie-George commented 3 years ago

Thanks

sims-lab / ccb-wiki

git version 2.31.1 - results in needing to define mergine/rebase on git pull #3