saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.14k stars 5.47k forks source link

Salt Truck Factor #26213

Closed gavelino closed 9 years ago

gavelino commented 9 years ago

As part of my PhD research on code authorship, we calculated the Truck Factor (TF) of some popular GitHub repositories.

As you probably know, the Truck (or Bus) Factor designates the minimal number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In our work, we consider that a system is in trouble if more than 50% of its files become orphan (i.e., without a main author).

More details on our work in this preprint: https://peerj.com/preprints/1233

We calculated the TF for Salt and obtained a value of 11.

The developers responsible for this TF are:

Pedro Algarvio - author of 23% of the files Thomas S Hatch - author of 21% of the files Joseph Hall - author of 8% of the files rallytime - author of 5% of the files Erik Johnson - author of 4% of the files Jayesh Kariya - author of 4% of the files Mike Place - author of 4% of the files Rupesh Tare - author of 3% of the files Thomas Jackson - author of 3% of the files Seth House - author of 3% of the files Gareth J. Greenaway - author of 3% of the files

To validate our results, we would like to ask Salt developers the following three brief questions:

(a) Do you agree that the listed developers are the main developers of Salt?

(b) Do you agree that Salt will be in trouble if the listed developers leave the project (e.g., if they win in the lottery, to be less morbid)?

(c) Does Salt have some characteristics that would attenuate the loss of the listed developers (e.g., detailed documentation)?

Thanks in advance for your collaboration,

Guilherme Avelino PhD Student Applied Software Engineering Group (ASERG) UFMG, Brazil http://aserg.labsoft.dcc.ufmg.br/

jfindlay commented 9 years ago

@gavelino, awesome. Salt is a great project to be a part of in part because of the diversity and vibrance of the contributing community. In fact, two of the developers you listed (@jacksontj and @garethgreenaway) are not employed by SaltStack.

jfindlay commented 9 years ago

I will get the other SaltStack engineers to respond as well, but my initial thought is that, while the Truck Factor (or Lottery Factor) is a useful, well-defined metric with which to measure the intrinsic liability in a project, I think a more useful, though less concrete metric would be the number of developers who have proficiency over a certain fraction of a codebase. If a developer responsible for a significant fraction of the code leaves, it is not too bad if there are other developers that are proficient or who could relatively easily gain proficiency in that code.

techhat commented 9 years ago

I would also question the reliability of those numbers. Correct me if I'm wrong, but I'm assuming that the aforementioned authorship refers to the creation of those files, but has little to do with the maintenance of them. For instance, I made the initial commit of salt/modules/mysql.py, but as of right now, only 14 out of 2006 lines of that file have my name on them, and the entirety of those lines are whitespace and documentation. GitHub's own contributor graphs for Salt also tell a very different story in terms of lines of code committed.

terminalmage commented 9 years ago

I second @techhat's concern with the reliability of the numbers. That seems way off, to me.

whiteinge commented 9 years ago

@gavelino your research is both awesome and pretty accurate! :guitar:

To pick up on some of the comments above, I would be very interested to see an authorship graph over time rather than a raw percentage. Salt is also something an outlier project since it is highly modular. There are core authors for core pieces of Salt that may not fit into a sorted list of percentages because they're comparably small parts of the codebase even though they're still large parts of Salt's functionality. WIndows support is a good example of that. @jfindlay's point that proficiency over part of the codebase does not always equate to authorship is valid but hard (or impossible) to track. :smile:

(a) Agree!

With the caveats mentioned above.

(b) Somewhat agree.

(Now you've got me worried. I sent out a notice recommending contributors stop buying lottery tickets...just in case. :wink:) A loss in one area would mean a temporary dip in maintenance of that area until someone else stepped in. But due to the aforementioned high modularity of Salt this is much less likely to affect the project as a whole.

(c) Yes.

basepi commented 9 years ago

Yep, these are interesting stats, but I propose that it's impossible to quantify bus factor because of the proficiency issue. Core engineers that have fixed many bugs and helped troubleshoot salt for years (@cachedout, @UtahDave, myself, @whiteinge, @terminalmage, many others come to mind) may not have written or rewritten large swaths of code, but mitigate the bus factor in huge ways with our general experience and knowledge of the codebase.

Still very intriguing data. =)

whiteinge commented 9 years ago

I've been puzzling over @jfindlay's thoughts on file authorship vs. project proficiency all afternoon. @basepi's thoughts tie in here as well. There are considerations relevant to the Salt project in addition to the FA, DL, and AC authorship criteria you detail in your setup. My musings are below, I hope they're interesting or maybe even helpful. @gavelino, I think your thesis is extremely interesting and I've been enjoying the thought exercise. :smile: Note, I have not given much thought to how broadly applicable these thoughts are but they are certainly relevant for the Salt project:

The (small) core of Salt has some complexity but the (large) module ecosystem is relatively simple. It would be interesting to measure the complexity of a given piece of code and weigh that against how many authors have been in that part of the code. The more authors that understand those core pieces, the more resilient the project. Measuring code complexity is also a tough problem, of course. The more complex parts Salt tend to be places containing complex algorithms, networking, multi-processing, or concurrency.

Another is how many authors have been involved in broad parts of the codebase compared to just one or two places. For example if a few people have touched most of the code they are good stewards of the project as a whole. If people have just touched singular parts of the code then they are probably good stewards of only that part of the code and not of the project as a whole.

Another is how long a given author has been involved with the project. (Maybe weighted against the above two criteria.) If someone's been around for a long time they may have a good understanding of the code, even if they haven't made many edits there.

Finally, as mentioned a few times, Salt is heavily modular. There is a distinct difference between the core of Salt and the module space. The core is virtually unaffected by the modules. Since there is no standard way to make a modular system this would be very hard to detect in an automated fashion but it has vast implications a project's ability to survive a mass lottery winning. :wink:

cachedout commented 9 years ago

Since I've been doing a lot of code review this week, I should add that there ought to be a metric added for the person who merges the code in question, as they should (in theory at least!) understand it well enough to accept it into the codebase.

gavelino commented 9 years ago

Thanks for the answers. I really appreciate the feedback.

Our research is under development and the answers we are receiving for this survey will help to better interpret the results and improve our approach.

whiteinge commented 9 years ago

Good luck! We'll keep an eye out for your work. :+1: