obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

Why use GitHub? #137

Closed zme1 closed 5 years ago

zme1 commented 5 years ago

As a sidenote, I'm a little confused about why we're using GitHub to manage our projects, since it's a high-end piece of version control software that's way overtooled for what it seems like our purpose is. So if someone can explain why we're using it and not something else, I'd really appreciate that.

zme1 commented 5 years ago

I had my doubts about GitHub until my own student told me that if we didn't teach it, none of our students would ever learn to use it. And I think she's right. My student (Rebecca Parker, here at Greensburg campus) is working on updating tutorials on our use of GitHub and also thinks we'd be best off using the command line version instead of the Graphical User Interface (GUI) client. That said, there is nothing "high-end" about GitHub since it is free to use, and it is a version control system worth learning in a context like a digital humanities course because it is how coding in teams works in the professional world, and it's even used for collaborative writing (just documents) because it is absolutely superior to Dropbox or Box for version control. Yes it has a steep learning curve, but after a semester of using GitHub with our DH course here (and we implement it every day for course operations to post and share and comment on code, as well as for project management), I am not going back. I'll post more on this later, and I hope Rebecca will, too.

zme1 commented 5 years ago

The most important thing about GitHub is that it provides version control, which means that it lets you recover from errors. A lot of editing environments support some sort of "undo" operation, but the one built into GitHub is more sophisticated than many of the alternatives. For example, if you make a mistake, do ten more things, and then want to undo the mistake without losing your ten other things, GitHub does a better job of supporting that type of surgical revision than plain ol' undo operations, which typically require you to wind back step by step. GitHub also supports unlimited undo from day #1, while some other systems support undo only until you save the document, or only for 30 days, etc. You will make mistakes that you'll want to undo, and even the simplest projects should provide some sort of version control. GitHub does that better than the others with which I'm acquainted.

The next important thing about GitHub is that it supports collaborative development. In a one-person project that isn't much of an advantage because you're collaborating only with yourself - or so we think at first. Yet I often develop my one-person projects in GitHub (you can see my activity at https://github.com/djbpitt/). The "so we think at first" is because ...

GitHub supports branching and forking, which are two ways of allowing different versions of the same project to proceed along their own paths, and the branches and forks can be merged when the developer is satisfied with the state of the development. For example, if your XML markup is valid and you change your schema and need to update the markup to bring it into sync with the new schema, you can create a "development" branch in your repo, do the work there, and then merge the results into the master branch when you're done. The reason to do this is that the master branch will always be valid, so your project will always be in a stable state even while you're enhancing it. In a team project it's even more important, because the members can work on different aspects of the project in their own branches or forks without destabilizing one another.

We've used Dropbox and Google Drive and Mercurial for version control in our projects and in the course and all of those have proven problematic in their own ways. I'm not really persuaded by the "overtooled" argument, either, not because it isn't accurate, but because it isn't an argument people make as easily about Excel or Word or Photoshop, any of which have lots of advanced features that are of use to experts and either confusing or invisible to novices. GitHub isn't as easy to use as Dropbox for synchronization, but Dropbox has much less functionality, and I think that, on balance, GitHub can provide good value to novices who use only a portion of its capabilities. It isn't the only sensible choice, and it has a downside, so I'd be eager to hear about alternatives. Has anyone done a head-to-head comparison with the competition on which they can report?

zme1 commented 5 years ago

Well there is not too much more that I can comment on here than hasn't already been brought up, but I would like to add a few points of my own experience with GitHub, as a student, and why I advocated for it so much for the DH course in Greensburg. The first thing I want to emphasize is about GitHub's free-ness (Okay, I know it isn't a word). I like that GitHub is free for me and anyone I want to invite to work on my project. In thinking about the longevity of a project you want to be able to be using a version control system that organizes your project so that it can be shared and developed in the future by people that may fall outside of the institution you started the project in. Since GitHub is free to all there are no concerns if I want to invite someone to look behind the scenes at my project or get involved now or in the future. Also since GitHub is free and relatively popular in the world of coders there is a lot of documentation explaining features and tools that might seem initially confusing or difficult. When I first started using Git and GitHub for the Greensburg DH class I too was mystified by its complexity, but once I started exploring other peoples' repos I started to see the benefits. The issues board and the wikis along with the commit messages add dialogue and little breadcrumbs that help to explain the history of a project's development. For me, my Nelson project has been an ongoing project now for two years. It is nice that new editors can go to the GitHub and explore how the project has developed and where the project is going through the dialogue I am able to provide alongside my code. If you look at the about page on my site (nelson.newtfire.org) I use my GitHub to reference past moments of code to explain parts of the project that have changed. I guess my main point is that Git and GitHub help to preserve the history of the project and ensure a sense of longevity. My point to Elisa when asking her to consider keeping Git in the class (this past fall) was due to an experience I had over the summer. I was trying to learn some Java with my boyfriend and since I was aware of Git and GitHub I went and searched for a repo that helped to explain in the issues board a solution to a problem we were facing. I then was able to grab the code right from their repo and use it to fix what we were having an issue with. This made me realize that without being introduced to Git/GitHub in the DH course I wouldn't have know to turn to it as a resource. As for using "something else" I use DropBox and Pitt's Box version control and have found that neither allow for me and another person to work on the same file or group of files without the last person to save rewriting anyone else's changes or having to make multiple copies of files that are in development. Git takes care of merging files that multiple people are working on at once and does so with an elegance that marks who is making what changes and gives the ability to revert easily if changes are undesired. Git seems to be just as difficult or easy (however you want to see it) to learn as other VCSs not speaking from my own personal experience, but that of my boyfriend (a computer programmer) and my brother (Point Park's Technical Support Specialist) who have used other VCSs such as Mercurial, Sandbox, Eclipse, and Microsoft's Team Foundation Version Control among others. As of lately they both have been moving more towards Git (maybe in part due to my advocacy for it). I hope that with the tutorials I have been developing that we can begin to get students to see that Git isn't as difficult as it initially may seem and that with a little reading and hunting around the many tools and features it has to offer are great from projects no matter the size. Git is my go to organizer and I like to think it picks up after me and my sometimes messy editors in allowing me to revert if need be and view the full history of my project. I wanted to add a few links to some YouTube videos that help to explain why so many others agree that Git is a great go-to Version Control System just in case you don't want to believe me and are more of a visual learner. https://www.youtube.com/watch?v=OqmSzXDrJBk https://www.youtube.com/watch?v=Di_-HAC6ms8 https://www.youtube.com/watch?v=VUaBfYCmJls

Of course I recommend our tutorials as well :)

Hope this helps answer your concerns Ryan.

zme1 commented 5 years ago

Elisa made another point recently in an email chain that the instruction team was bouncing around that is relevant to yours about Github's prevalence, and I wanted to toss that in here as well. Since Github is so popular in the world of coding / software development, having a Github presence that shows off your knowledge via past coding projects is very, very beneficial for those looking to go into any type of development work in the future, as a decent number of past students of the course have. While that doesn't apply to everyone, from a practical standpoint having an extra resume booster can never hurt!

zme1 commented 5 years ago

As someone who has used Eclipse for version control, GitHub is so easy to navigate and utilize. As previously mentioned, there is a steep learning curve, but most other software options have similar curves, and at least this one is free.

Trying to collaborate on code is difficult enough, that the additional struggles of the software just make group efforts stressful. I don't know how much collaborative coding you've tried to do, but it is a lot like trying to write an essay with someone; everyone has their own style. All of my professors this semester are recommending it over any other VCS for group projects. It makes for incredibly easy tracking of who did what, as well as making it easy to bring separate revisions together into one cohesive project.

It is the way the majority of software developers are going, so it will be a functioning part of my resume after this semester. Software development companies are looking more for people who can work together on projects instead of people who just sit in cubicles and pound out the entire thing on their own, and this helps them see that. If I do every aspect of my groups project, it will be easy to see and weed me out because of my inability to let others do part of the project.

zme1 commented 5 years ago

This has been really helpful, although not necessarily convincing to me (which isn't relevant for the coursework, but is for potentially offering alternatives to students in the future). My concern was not GitHub's accessibility; in this it is considerably more effective than many other forms of version control, nor indeed the fact that it is version control. Rather, my concern was why GitHub in particular for this course in particular. While, of course, the software is useful for large-scale team projects, and especially for coding collaboration, GitHub was expressly designed for large-scale corporations, not teams of two or three (or in my case, one). Of course, it can scale, as David pointed out, but just because it CAN scale to our level doesn't mean its optimal for our level. Admittedly, I'm less well-versed on potential alternatives than many other people I know (many of whom I have discussed the use of the program with and have found it confusing of their own accord), so I can't exactly offer an alternative program that I think would fit the mold of this class any better than GitHub. What I can say is that this course already features a whirlwind introduction to numerous technologies, and the rapid advancement of free, cloud-based sharing tools that are far more intuitive (if currently a little less complex and arguably [keyword] undertooled for our purposes) to grasp may well mean that students coming into this course have experience with other forms of software (or, in the case of a few friends of mine, something against GitHub) that may allow them to control their project much more fluidly than if forced to use GitHub. While GitHub may be useful, it adds a considerable amount of additional mental load to a course which is already very heavy on that front, and if a student were to come in with prior knowledge of a different tool which can be used for the same purpose, what would the harm be in allowing them to utilize that instead? You're all arguing from a pragmatic perspective, so let's take that head-on. Let's say a student who's really, really familiar with Dropbox comes in, and they can fly through dropbox like nobody's business, but they just don't seem to be getting GitHub. Dropbox can do most of the things this course REQUIRES from GitHub, since many of the features described above aren't strictly speaking necessary, but rather likely to be helpful to someone who's already got the gist of GitHub. This hypothetical student would likely perform considerably worse if forced to use GitHub over the software that they're familiar with, simply due to what's ultimately an arbitrary distinction between what they are allowed to use for their own work versus what they aren't. This argument only becomes stronger when considering the potential for a student who might versed in an alternative form of version-control software, such as Eclipse (mentioned above). This isn't to say that this course shouldn't teach GitHub (although, frankly, I'm of the opinion that if it's as essential as some of you seem to be suggesting, it should be taught in a course where it can receive more attention), simply that it's potentially beneficial to allow the use alternatives if someone comes in with the knowledge necessary to use them for our purposes. Thanks for all of your responses; they have really been helpful.

zme1 commented 5 years ago

That GitHub offers advantages over alternatives doesn't mean that the alternatives don't offer their own compensatory advantages. Since we want everyone in the course to be able to access everyone else's materials, the overhead of having to learn and use multiple systems grows quickly. We did try using different systems with different projects in the past, and I concluded that it was easiest for the course community (even if not necessarily the first choice of every member of the course individually) to use a common system. And having explored some of the alternatives, I selected GitHub as a suitable common system for that purpose. Someone else in my position might well have made a different set of decisions without necessarily being wrong.

zme1 commented 5 years ago

As a student of the course from ages past, when GitHub wasn't a requirement, I can tell you that my project team had to redo a lot of work in the beginning of our markup because Dropbox does not handle multiple editors to a single file at all well. We ended up working out a system for claiming chapters of work at a time to avoid overwriting each other's edits, but it was several hours of work before we got it settled, and even then it was sometimes a hassle to get edits into the main file without them being overwritten. Dropbox power users can get a lot done (and I would argue that all/most of my project team was a power user of Dropbox if nothing else), but they can't get around the fact that Dropbox just can't handle multiple people editing a single file. There are certainly workarounds (as my team discovered), but they are just that: workarounds.

What's more, GitHub is a good choice for version control in this course because it is the most widely used in the Real World. My team at my Real World job uses it, as do most other developers. Although it's true that other version control software exists, these other systems are not as widely used or as accessible as GitHub, in my opinion. GitHub is useful not just for its version control, but also for its open source-ness. Other developers can easily access and expand on code in GitHub, which is a major advantage that I think other version control software lacks.

A large portion of this course is growing accustomed to and taking advantage of the community based side of coding. Coding doesn't happen in a vacuum. Often, it takes googling and asking other developers how they might approach a problem to find the solution. Learning how to find guidance and when to ask for it is a vital skill that is useful not just in this course, but in Real Life and, arguably, the rest of your academic career.

It's true that this course already asks a great deal of its students, but part of the point of an honors course is to get students out of their comfort zones (all of them) and to push them to their greatest potential. It's certainly a lot to ask, but the success of previous students and the continued growth of the course suggests that it isn't too much.

zme1 commented 5 years ago

I'll admit, Dropbox was a poor choice of example. It's not a version control software, just cloud storage. More importantly, I think that readers may be misunderstanding my argument (and it is an argument). I'm not suggesting this course stop using or teaching (which it isn't, by the way, but I'll get to that later) GitHub. That, frankly, wouldn't be a pragmatic suggestion. I'm suggesting that requiring its use is both potentially detrimental to a project's success and obstructive to getting work done. And this has yet to receive any response whatsoever, except perhaps for a paragraph from David in his most recent response. To address David's remarks specifically, there's a difference between teaching (which, again, you're not exactly teaching us GitHub) multiple systems and (and this my stance, for the record) allowing students to use alternative methods if they know how and get explicit permission (this comes with the caveat that they must be INFORMED of this allowance from the start). The difference is exactly in that overhead. Option B comes with exactly 0 additional overhead and multiplies the course's accessibility and versatility exponentially, while Option A does fall into the overhead trap. The difference comes at the policy level. Option A implies that everyone needs to learn every system, while Option B implies that someone could possibly already know a separate system and want to use that. Janis is right, GitHub is a widely-used software in the so-called "real world" (as if there's parts of life that aren't real), and its popularity makes it useful. I'm not arguing that. I'm arguing that it's absurd to restrict people to its use, particularly and especially if they have a good reason not to want to use it, such as knowledge of a separate version control software or, as could potentially be (and, in my case, is) the case, a solo project where it's simply not the case that there need to be multiple editors for a file at a given time. In that situation, GitHub becomes an additional (and unnecessary) layer of complexity that may (and, in my case, did) take several hours to figure out, and even then process most of the material in the repo as a workaround. Coding doesn't happen in a vacuum; I'm not disputing that in the slightest. I am disputing the necessity of GitHub in the atmosphere of coding. It's not the Oxygen (no pun intended) or Nitrogen in the formula; it's just the trace amounts of water vapor that are there as a byproduct of the water cycle. Yeah, it's there. And yeah, sure, it's useful. But it's not what makes air breathable, and by the same token GitHub doesn't make coding doable, or even frankly make it that accessible. It's just one of several tools at the disposal of a coder (and to be even more brutally honest, we're not even really coding; that implies a degree of programmatic design that markup languages and transformation languages simply aren't capable of), and as you say, we need to able to use all of the tools at our disposal. And restricting students to the use of GitHub arbitrarily reduces the number of tools at their disposal. If someone is willing to learn or comes in with knowledge of something else which could serve as an adequate replacement for GitHub, or if they do not need the functionality that GitHub provides, then what purpose does FORCING (which is what is done when something is made a requirement; a student is forced to work within that restriction or fail the course) that person to use GitHub serve? Schadenfreude? This is a question I have asked repeatedly, and no one is interacting with it. It's not even that I'm not satisfied with the answer; I'm straight-up not being answered. Everyone answering on this thread is doing a reasonable job of arguing that GitHub is useful. THAT IS NOT WHAT I AM ASKING. Of course GitHub is useful; coders don't tend to create useless things. What I'm asking is why, when the option presents itself, should alternatives not be allowed? These are two separate and distinct questions, and I'm not entirely sure what's so difficult about treating them that way. My narrative hasn't changed. I'm not throwing new questions into the mix here. I'm asking the same one for the third time because no one has actually answered me. GitHub is useful; I get it. But other things are useful, too, and in different and potentially course-relevant circumstances. Convince me not that GitHub is useful, but that it's the only tool we should be allowed to use, because that's what I'm arguing against. And in reference to this course being an honors class, that's not really a point in GitHub's favor. GitHub isn't even TAUGHT in this course. It's not. We've spent precisely 0 minutes in class going over how to use or operate GitHub, meaning that students are expected and required to learn it on their own time. There's an unfinished tutorial that's not even up on the page, but we're just EXPECTED to know how it works and to use it accordingly, or to work with our project mentors on figuring it out (which, just this week, resulted in a needless extension of the length of my meeting and an overall reduction in the productivity thereof, not mention resulting in me needing to hold a second meeting over the weekend). It's not actually course material. Nowhere on the syllabus, at any point during the semester, are we required to read about GitHub, nor is it ever apparently worth going over in class. We've been given a drill and been told to use it without so much as an instruction manual. Is it really a surprise when some students come to you wondering what drill bit to use, or not knowing how to turn it on, or show up with drill-inflicted wounds? If it is, it shouldn't be. There's no reason to expect them to know how to use the drill. That's not pushing us to reach our potential; that's pushing us to the edge of a rooftop. If I sound frustrated and angry, it's because I am tired of being condescended to, and I'm tired of asking questions and not receiving straightforward, direct answers, which suggests to me that no one reading what I'm writing considers me to be worth the effort to even read thoroughly. And that completely defeats the purpose of a discussion board.

zme1 commented 5 years ago

Ryan, Github can be frustrating at first, but we're confident that you will quickly master it. You're right that you could complete a project like yours without Github; it may even be easier without Github. However, the primary purpose of this course isn't to have students complete DH research projects, but to teach them a number of development technologies that are important in the digital humanities. As you agreed, Github is very important in many forms of development, including DH research in general, so we include Github in this course. Students learn best by applying concepts, so we ask students to use Github.

All of the technologies in this course follow this formula: we teach them because they are important in DH research in general, and we ask students to apply and engage with them because that is the best way to learn. This is independent of the technology's utility in a student's particular project, and the student projects are subordinate to the primary goal of teaching these important development technologies. (Consider this: It's not that you have to use Github because of your project, but that you're doing your project because you have to use Github, among other technologies.) Another example is Javascript. Most projects don't end up using JS extensively, and they don't need to, but we ask every student to learn a little JS anyway because it's an important part of the digital humanist's toolkit.