Should we teach the underlying graph model explicitly?

gvwilson commented 8 years ago

From this thread on the 'discuss' list: should we teach the underlying graph model of Git?

I've tried teaching the underlying model of Git to novices, and had it fail miserably - for people who've never used version control, and aren't yet sure if they want to, directed acyclic graphs of hashes of files is a looooong way removed from anything they think they actually want to do. But that could have been the way I did it...

jgosmann commented 8 years ago

For me it was essential to understand the underlying graph model to become proficient with git and really understand what the different commands are doing and can be used for. I feel once someone has this basis it is much easier to explain and understand different commands.

BUT: This would require quite some theory beforehand and which does not fit well into the SC style of teaching things that can immediatly be used to solve a specific problem. Personally I am a person who goes through that initial theory if I can see that it will things easier to understand for me later. But it seems that is not necessarily true for most people.

matthew-brett commented 8 years ago

I must say I never got to directed acyclic graphs as a technical term, but that's probably because I don't think 'graph theory' when I think about git. The way I teach git is very similar to the git parable description - set up the problem, think how to solve it, then introduce git and show how git solves it. I save time by not covering stuff like github, remotes and so on, on the basis that they can pick that up fairly easily later, and it's difficult to explain properly without the model.

The other method is on the lines of "get people going with basic commands, hope they'll learn the ideas later" or even "the basic commands are fine, the mental model is just detail". I'm pretty confident that you can't enjoy using git for your daily work, if you don't know the model, but I suppose it's possible that people always pick that up later. Certainly I didn't pick it up by osmosis - I had to read someone explaining it.

matthew-brett commented 8 years ago

Of course this is just my opinion, I think what we need is a good control trial comparing:

Teach commands, no or not much model;
Teach more model and fewer commands;

and see what the level of git usage is after 3 and 6 months, how likely the person is to be the "person you ask about git" and so on.

chendaniely commented 8 years ago

I just finished my Git series for Data Intensive Biology ran by @ctb.

I taught git with a model. But the model is really just a visualization for me to show where the commands operate. I don't put emphasis on what a DAG is, I probably mentioned it in passing, or those who know what a DAG is will be able to make that connection.

I learned git by just working on my own. No branches, just a linear master branch and push/pull to GitHub as a means of backup. I try to stress this point for new learners. This is the core of git. add, commit, push, pull.

The general order of topics for the DIB workshop was:

branching (locally)
merging branches (locally)
merge a branch (github) -- this is a less violent way of introducing what a pull request is
collaboration by adding users as collaborators
collaboration using forks

I think the way I presented Git worked. What I should do is write a blog post about it, link to the topics and the corresponding time points in the videos, and ask people for feedback...

For all I know, the folks at UC Davis practically ended up teaching their own Git workshop because everything I was saying made zero sense :)

Different students learn things differently, but I do agree with the original post that started this discussion: just blindly learning git commands and the order to type them is not the way to go. If anything don't learn the order of commands, but learn the overall picture with labeled edges

git diagram I end up drawing by the time i'm done teaching (from a hardvard workshop):

01-git

At the very least, I hope students know what the top left panel represents when they leave.

daisieh commented 8 years ago

I think that, for me, the graph theory always makes it less clear to me, rather than more clear. I am not sure how it helps if we're not even teaching branching.

gonuke commented 8 years ago

I am concerned that even if knowing the graph theory/mental mode/underlying mode helps make git easier to understand, it would seem abruptly theoretical in the flow of a SWC workshop. Everything else is so hands on, that to take a tangent and discuss directed acyclic graphs (regardless of what words you used) would seem odd.

Without a doubt, I always draw pictures of DAGs when talking about git, however.

gonuke commented 8 years ago

Perhaps there is merit in coming up with a very good canonical set of graphics that instructors can use to tie into their lessons, jumping between the graphics and the command-line interface. Arranging a lesson that is still quite hands on, but uses a series of graphics that are carefully thought out from the learner's point of view might be a good outcome.

wking commented 8 years ago

On Wed, Apr 13, 2016 at 02:01:43PM -0700, Daniel Chen wrote:

git diagram I end up drawing by the time i'm done teaching (from a hardvard workshop):

I think there are a few issues here:

The objects Git is storing in the graph (trees, blobs, commits, and tags 1).
References to nodes in that graph (branches and lightweight tags 2).
The porcelain that you use to construct and manipulate the graph and references, including pushing objects around to other places (remotes, working directories, …).

That diagram is focusing on 3, which is what you need to do anything with Git. But the original post's “simple and powerful underlying model” is 1 and 2; Mark referred to 3 as “an immense trashheap” 3.

I agree that a good grasp on 1 and 2 is important, but hard to cover without sacrificing some porcelain time 4. If you have a second projector, a low-stakes way to transmit 1 and 2 might be running gitg (or similar) on the second projector [5,6,7]. That should automatically update as you adjust the graph and references via porcelain commands, which helps with commit, pull, push, etc. If you don't have a separate projector, running: — $ git log --graph --oneline --decorate

is pretty close (and you can setup an alias for that [8,9]).

Keeping track of what goes on between the working directory and staging area in the run-up to a commit would use this view 10. If you don't have a separate projector, running ‘git status’ and ‘git diff …’ frequently will help (and the existing lesson does this).

Having a graphic that covers the state relevant to your task and automatically updates as you step through the task is great. We've mostly avoided teaching graphical interfaces because the command line is more flexible, easier to find support for, and generally easier to install. But if the GUI is just for reading state (and not for making changes), I think it's fine for the instructor to use it as a teaching aide without having students install and learn the GUI. The command-line log and status have pretty much the same information, they just don't automatically update while running in a separate window.

wking commented 8 years ago

On Wed, Apr 13, 2016 at 03:45:22PM -0700, Paul Wilson wrote:

Perhaps there is merit in coming up with a very good canonical set of graphics that instructors can use to tie into their lessons, jumping between the graphics and the command-line interface.

This is pretty much what I think, except I'd use gitg or similar to display state on the fly instead of commiting images to the repository. The existing images 1 are nice because they overlay the commands required to make a change. But when you're live-coding you have those commands in your terminal, so it's less of a problem if the GUI-generated view doesn't show them too.

There are also some very pretty graphics in the Git Book (e.g. 2), which are available under the CC BY-NC-SA 3.0 Unported license 3.

matthew-brett commented 8 years ago

"Graph theory" is a distraction here, I don't think anyone thinks we should be teaching that.

However, the fact that git stores the history by pointing backwards from newer commits to their parents, is not very difficult to explain, and is about all you need to know about the "graph" aspect. But you do need to have the idea of commits as snapshots of the working tree, and the branch as a label that points to a commit.

It's true that this kind of theory would be a break from the current way the lesson is taught, but - this is my opinion - the students will be in a much better position to keep learning about git after the class.

Porcelain is terribly fragile, particularly git porcelain.

justbennet commented 8 years ago

Git is not file oriented, it's repository oriented. That, in itself, presents a significant cognitive challenge to someone who is new to git, possibly especially if they have some background with something like cvs or subversion that are file oriented.

If you look at http://swcarpentry.github.io/git-novice/01-basics.html what do we start with? Changes to a file. Are we setting ourselves up for cognitive dissonance if we start with file-level changes then try to make a transition to a repository oriented view?

Can anyone think of a good way to start on the right foot, with the good theory of what git really is, and get back to the point where people could competently use git commands to track a single file, create a branch, merge a branch, and push to github?

Would it be better to teach them clone before init? They get a copy of the repo, then we have them change a file or two, then we explain how we have changed a file, but that has also (and maybe more importantly) also changed the repository. So, commit the change to the repo, push it back. Once we show people how to modify a repository, we can show them de novo creation.

That's entirely off-the-cuff and probably not the right track, but maybe there's something someone could use in it? Anyone tried a strategy that does not start with git init?

kevin-vilbig commented 8 years ago

Would it be worth discussing the mostly omnipresent UNIX botanical metaphors rather than trying to explain these things via the purely abstract models? Kernel, shell, root, branch, tree? In many ways, git is intended to be an addition to the GNU/Linux CLI IDE, or a drop-in replacement for other version control systems, which does use quite a few planty metaphors all the way down.

iglpdc commented 8 years ago

I think that part of the problem is forgetting about the conditions under we are teaching. As I understand it, the lesson is not as much as a "Git lesson", but a lesson on why and how can a scientist have some way of automating version control for its files.

The starting point of the lesson (the comic) is a truthful representation of our audience and the challenges they find. I don't really think that understanding the underlying model of Git is relevant at all for them, same as understanding hash tables is not relevant to learn how to use Python dictionaries or, for that matter, get people into using Python to automate their scientific analysis.

I think a common feeling from this thread is "I only understood Git once I understood the underlying model, despite I was using it for some time". I think the goal of the lesson is get people to start using it (or other version control system), realize that is of great help, and maybe then go into the excellent references some mentioned to really understand how it works.

Despite all this, I agree that explaining some of the concepts that inform the underlying design are really helpful when teaching. The best example may be the idea that Git stores snapshots, not diffs or anything else. A common question after that is what happens with files that don't change often, so explaining that Git stores only one copy and introducing hashes as fingerprints for files, could be also a useful analogy. Similarly, when teaching branching, references and parent commits would be helpful too.

I think we should work on incorporating all these analogies when they are useful for our goal, but that we also should be aware to not turn the lesson into a version of the Git book or some other reference.

So my general feeling is that understanding the underlying model would be what defines a learner as intermediate, while just having these few ideas of how to use it but not very well grounded or not very well connected is what defines the learner as novice.

iglpdc commented 8 years ago

Would it be better to teach them clone before init? They get a copy of the repo, then we have them change a file or two, then we explain how we have changed a file, but that has also (and maybe more importantly) also changed the repository. So, commit the change to the repo, push it back. Once we show people how to modify a repository, we can show them de novo creation.

I think this does not connect with the problems that our learners have. They already have some code or prose in their laptops which they should put under version control. When I teach the lesson I always create some files under planets and tell people to pretend that this represent one of their existing projects. Then I do git init to turn the existing project into a repo.

The problem with the clone-first approach is "what I am cloning and why" (novice learners don't typically work in some project using Git already). Also you will have to explain what are remotes, which people have problems understanding. It maybe a better approach though in a group which for some reason has to collaborate on an on-going project (for example, if you are teaching Git to people wanting to join a software sprint).

ctb commented 8 years ago

+1 Ivan

jttkim commented 8 years ago

I agree with Ivan too. One thing that is neat about the init first approach is that this way, learners know where stuff comes from, as they create files, commits and finally conflicts all themselves. If there's an existing repo to clone and then work with, we'll have to explain the background scenario and that will always be subject to variation, depending how close that ends up being to the contexts and purposes that participants have.

matthew-brett commented 8 years ago

I think this brings us back to things that Greg has said many times in the past about the difficulty of teaching git compared to something like subversion.

If git were subversion, it would be perfectly sensible to proceed in a linear way through the commands and expect the students to get good use from the lesson.

But, I doubt any of you would disagree with me, that, compared to subversion, the student is far more likely to leave a simple practical lesson on git and soon hit a very confusing error. More controversially, I think the students are more likely to survive tthese errors if they understand how git works, to an extent teachable in a morning's lesson.

Assuming these statements are at least plausible, this leaves the following trade-off:

Teach git by command line recipe, so that the students can see what git can do, and with that motivation, expect them to push through the pain when they hit problems after the lesson;
Teach git by ideas and less command line recipe, convey less practical vision of the range of things git can do, but expect students to do better at working out errors.

We can speculate as to which is correct, but we've got a lot of combined experience here, teaching git, and the answer isn't clear-cut to us as a group.

So, can I suggest that we need some data on this?

On that line, I would be very happy to volunteer to draft a novice lesson with less recipe more ideas, and video it, to at least show proof of concept. I'd love any help y'all have to offer. Then we can maybe think how to do some control trial of later retention.

twhitehead commented 8 years ago

Thanks everyone for the great discussion. Figured I should throw in my two cents as well since I started the conversation. The last time I taught git (the start of this month), I followed the section orders in the prepared script but deviated quite a bit in how I taught the material.

In particular, I made quite a bit of use of the blackboard as I've come to feel, while assisting with the git session, that the students really need some simple pictures to try and cement all the bits together. Otherwise it just feels like a huge pile of command names to remember.

I also never mentioned the word "DAG". Instead I

showed the the PhD comic and laughed with them about having done that,
put a bunch of file names up on the board in the style of the PhD comic,
asked what we are trying to encode in these file names and waited a bit,
answered my question with the history: where each of version came from,
drew arrows between the names indicating where they came from (this naturally introduced the "DAG" including branches and merges [thanks to the supervisor's comments]),
said people realized pretty quick computers should really keep track of this information for us,
introduced git as software that was created to help us do this.

My other tool was constructing a variant of this diagram

git2

as I went through the session material. In particular, I started with a brief description of the three parts

the object store (where our history is stored),
the index/staging area (where we put together our "next version"), and
the directory/working tree (our folder of files)

and then filled in the links between them as I introduced the various commands going through the session material. I felt this really helped them see and focus on the big picture instead of struggling to remember the names of all the commands. I also slowly built up our "DAG" in the "Object Store" as we did the various commits.

I got quite a bit of positive feedback at the end about it, especially from students who had had to use git/github before (e.g., for class projects), but never had it explained beyond "just do A B and then C and don't think about it". Students also came up afterwards and took a picture of the board with their camera for future reference, which I felt confirmed its usefulness.

I feel that git differs from the other lessons in that you really need that blackboard time too. In my case the screen entirely obscured that board, but, on the suggestion of the host, I rolled it up 1/2 to 2/3 of the way, put my desktop to black and my terminal at the top. Then I seamlessly switched between the live coding and the blackboard. It was a great idea that really worked well.

Cheers! -Tyson

gvwilson commented 8 years ago

Pull requests with diagrams would be greatly appreciated - our current lessons don't have nearly enough pictures.

twhitehead commented 8 years ago

I'm part way through trying to do the above up in inkscape from my last sessions. Turns out I suck at vector graphics though and had to abandon it after it ate most of my day in favour of reviewing the material before teaching it. Good chance I'll manage to finish it when I have to teach it next at the end of May. I'll make it available if that occurs (still not near as nice as those who have developed the skill).

matthew-brett commented 8 years ago

@twhitehead - that sounds like a very nice plan. It doesn't surprise me that thinking about it this way would be useful and interesting to the students, and it's a great idea to put up the structure and keep the students thinking about it as they work.

I think the key idea is to get the students to a place where they are able to reason about what git is doing, so they can keep learning after the class.

kevin-vilbig commented 8 years ago

You have access to all of the shapes that you need in a slidemaking program without having to muck about with a full on vector image program like Inkscape. I knocked together a roughin Google Sheet slide as an example. Feel free to use it if you think it will work. I'm teaching GIT in a couple weeks, so if you don't, I will.

https://docs.google.com/presentation/d/1lPT2SJodPekhd03-eSeas--9OnaWjr4SV9h7DqOdm2Q/edit?usp=sharing

daisieh commented 8 years ago

I think I was trying to get around some of the confusion of using a graph model when I drew the repos as stacks of commits, like in https://github.com/swcarpentry/git-novice/blob/gh-pages/fig/git-checkout.svg.

My learners have always found my metaphor of the stacks, with the staging area as one bin, to be helpful as well: https://github.com/swcarpentry/git-novice/blob/gh-pages/fig/git-staging-area.svg

and I think people like the cartoon I drew of the whole thing as we teach it: https://github.com/swcarpentry/git-novice/blob/gh-pages/fig/git_staging.svg

matthew-brett commented 8 years ago

I spent some time writing up the way I teach git's model here : http://matthew-brett.github.io/pydagogue/curious_git.html

twhitehead commented 8 years ago

I did a one-hour (broadcast-only) general-interest webinar on git for our users the other week.

https://www.youtube.com/watch?v=meFv-GDTkjE

Structured it around building up a version of the above diagram while doing a live example. Also decided to just do branches right up front, as if they weren't any big deal, since the internet consensus seems to be that branches are the real git killer feature.

Comments? In a couple of months I will likely be doing the second part for another one-hour general-interest seminar for our users. This will get more into collaborating through the gitlab page. I'm also teaching a regular SC git session Tuesday, so I'll have a chance to try some new stuff there.

@daisieh you've made some really great graphics -- I envy your ability :smiley:

matthew-brett commented 8 years ago

https://us.pycon.org/2016/schedule/presentation/1699/

guyer commented 8 years ago

Whenever I teach git, I open with http://xkcd.com/1597/

I don't think graph theory would be at all well received by most of the people I've taught (not to mention that I'd be incompetent to teach it).

wking commented 8 years ago

On Mon, Jun 13, 2016 at 01:58:00PM -0700, Jonathan Guyer wrote:

I don't think graph theory would be at all well received by most of the people I've taught (not to mention that I'd be incompetent to teach it).

See @matthew-brett scoping “graph theory” for the Git-lesson context 1.

gvwilson commented 8 years ago

I'm :-1: on trying to teach the graph theory.

matthew-brett commented 8 years ago

Guys - everyone is -1 on teaching graph theory - that's not what anyone is suggesting.

matthew-brett commented 8 years ago

By the way, I take http://xkcd.com/1597/ to be a satire on the standard git teaching method, summarized as "Just memorize these shell commands and type them to sync up".

atz commented 7 years ago

We can teach implications of the overall graph model effectively with a few salient small-scale points:

Each commit (hash) also represents an entire repo state. (Show git checkout <HASH>) How?
Branches are composed of commits in order (w/ corresponding repo states).
The parent of a commit is part of the identity of that commit. We can show that rebase and even cherry-pick changes the commit hash, even though everything else is the same.

That is, effectively, the entire Merkle DAG applicability to git at this level.

At the broader scale we can outline that git tracking separates 3 logical parts of a repo:

filesystem contents (working directory)
commit content (diffs + metadata)
branch history (commit chains)

Then highlight how the git commands we teach affect those parts together.

Right now, we are a bit hamstrung by the (presumed intentional) de-emphasis on branching, such that any graphiness throughout the lesson looks more linear.

kekoziar commented 3 years ago

Closing old, inactive discussion.

matthew-brett commented 3 years ago

Ah yes - what a shame there was no more discussion! As time goes on it seems ever more clear to me that we must in fact talk about the commit graph in teaching Git.

swcarpentry / git-novice

Should we teach the underlying graph model explicitly? #263