swcarpentry / git-novice

Version Control with Git
http://swcarpentry.github.io/git-novice/
Other
342 stars 925 forks source link

Providing a definition for distributed (vs centralised) version control #757

Closed KateCourt closed 1 year ago

KateCourt commented 4 years ago

Some brief history of version control is provided to introduce students to contemporary version control. This includes a reference to distributed vs centralised systems. This could expanded with a definition that does not rely on the student understanding the phrase 'meaning that they do not need a centralised server to host the repository' and instead explains what distributed means. Something along the lines of 'meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time.'

(part of checkout process)

markmatney commented 4 years ago

I agree that this part of the lesson should be developed. As-is, I don't think it's very convincing for learners. Currently, it seems to depend on learners accepting that Git opens up more possibilities for collaboration:

These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

I don't dispute that statement, but I don't think it's the best way to motivate distributed vs. centralized. Since learners may have had success using a centralized VCS (Google Docs, Box.com, Wikipedia, WordPress, etc.) in the past, and since remotes and collaboration aren't covered until much later, this feels like "trust me, it's better, I'll explain why later" and may not go down easily.

I think the motivation would be stronger if there was an explanation of the "single point of failure" problem of centralized VCSs, and how distributed systems address this by making many copies of the revision history. An instructor might give learners a tour of some centralized VCSs and then pose some questions:

"What happens if there's a network outage?" "Or if the disk on the server gets corrupted?" (hardware issues happen, even in the "cloud") "And you need to use an earlier revision of your work before the submission deadline this evening?" :scream:

That distributed VCSs (1) lower the risk of data loss and (2) protect against network outages (since each copy includes a full backup of the entire revision history, with no copy being more "authoritative" than another) is a stronger selling point IMO.

All that is to say, more than just lacking a complete definition of "distributed", I feel the lesson lacks a strong motivation for distributed as an alternative to centralized.

kekoziar commented 3 years ago

I think there are two challenges presented here.

The first challenge is - as with all Carpentries lessons - to not add extra content or time to the lesson. The issue suggested to change the phrase

meaning that they do not need a centralised server to host the repository

with the expanded

meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time.

In context, that would result in

More modern systems, such as Git and Mercurial, are distributed, meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

which repeats itself and content already covered above.

@KateCourt What do you think about only revising the phrase?

More modern systems, such as Git and Mercurial, are distributed, meaning that a full repository can be copied to local computers, instead of only existing on a centralized server. (emphasis mine) These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

kekoziar commented 3 years ago

The second challenge is to not overburden learners with advanced knowledge that doesn't meet the objectives of the lesson. The lesson isn't motivating learners to choose between a centralized and distributed VC workflow model; it's motivating them to put their work under some type of version control. Most learners will not have been introduced to the topic of different models of VCS, so an expanded section on the differences between CVCS and DVCS may be interesting to advanced users and computer scientists - I will admit, I fell into a nice rabbit-hole on the history of Git, which happens to coincide with the theory and application of centralized vs distributed systems and workflows - but it's tangential to the lesson and the objectives of the episode.

@markmatney Do you think some of your suggestions might fit into the current exercise, or a new exercise that fits within the episodes existing learning objectives?

Although, since Google Docs, Box, and Dropbox all allow offline work, I'm not sure if What happens if there's a network outage? is a good motivator.

kekoziar commented 3 years ago

As an aside: I really wish we could integrate the phrase "distributed merging," because I think that's really the essence of DVCS. It's not that all repositories are equal; they can't be equal while maintaining a functional development environment. The distributed workflow section of the Pro Git book includes "blessed repository" as the "canonical official" repository. Distributed merging simply allows merging from local source repos. More interestingly, this distributed merging allows workflows to be adapted to the project/company, while the centralized model only allows one gatekeeper workflow. But, this truly is an advanced topic (the earlier referenced section is in chapter 5 of the Pro Git book; our lesson only covers content in chapters 1-2 of Pro Git) and IMO a poor introduction for novice learners to Git.

markmatney commented 3 years ago

@kekoziar after re-reading my earlier comment, for some reason I was conflating "distributed" with some qualities that aren't unique to DCVSs (file type agnostic, and enabling offline work). Thanks for pointing that out.

After thinking about this more, I would actually advocate for removing any more than a passing mention of distributed vs. centralized from the lesson; I think even the call-out box in question has too much info.

kekoziar commented 1 year ago

@kekoziar after re-reading my earlier comment, for some reason I was conflating "distributed" with some qualities that aren't unique to DCVSs (file type agnostic, and enabling offline work). Thanks for pointing that out.

After thinking about this more, I would actually advocate for removing any more than a passing mention of distributed vs. centralized from the lesson; I think even the call-out box in question has too much info.

Thanks for the update.

I think this particular suggestion is stale.