Open mr-c opened 8 years ago
:+1:
This is an interesting one. As many people have observed, authorship and credit are super complicated, and deeply embedded in the set of social and political structures that make academia. Making people think harder about how we do that is a big part of Depsy's goal.
The cool thing about software is that we've got these two sources of authorship data (commit logs and authorship lists), and as these two sources collide, they expose the (good and bad) assumptions each system brings with it.
The commit log is measurable, objective, granular. The author list is political, embedded, subjective.
Where we can, we're using the commit log because 1) it's a different and new approach that will spark conversation, we hope and 2) it allows the long-tail of committers to get more much-deserved credit, and 3) we're people who believe that lots of things benefit from objective measurement.
But where we've got authorship information, we're erring pretty hard toward trusting that over the commit log, simply because when it comes time for the reward system to, well, reward--the authorship lists are the ones that matter. The reward system is political and embedded and subjective in the same ways that the negotiated author list is, and the way the authorship list is created is not ignorant of this.
Placing a person with one commit in the authorship list means something. There's signal there. Maybe that was the most important commit. Maybe that person had the idea for the project. Maybe whatever. Point is, the authors say that person is important, and they are in a position to know better than us, so we believe them.
It's too bad there's not a system for people to define contributions in either a numerical or at least relative sense. There's no way depsy can guess and win - and it shouldn't have to. Maybe Depsy should say what it wants (a file in some format at the top level of the repo?) and see if developers can/will supply it.
@jasonpriem - you make a bunch of great points about the complexity involved in this decision.
My main concerns with just using authorship as is done for the Python packages are:
I also have a concern about just using commit logs, which reflect @jasonpriem's point that sometimes people can make very important contributions to a project without generating a lot of commits. E.g., I know folks who do a lot of code review on projects and participate actively in design discussions, but this might not show up in commit logs. These folks will often be included as authors even if they only have a small number of commits (or even none).
So, all that being said, what about some sort of weighted combination of the two approaches when both authorship information and commit logs are available? I don't know exactly what the right answer is (and maybe there isn't one), but what about assigning 50% of the credit evenly across designated authors and 50% of the credit based on relative contributions from commit logs. This probably isn't perfect, but it seems to incorporate the benefits of both approaches and makes weaker assumptions about the intent of the developers in designating authors.
@danielskatz Yup, agreed...we're never going to know exactly given the low signal provided by the present system. I like the idea of a file people can use to specify exactly....it could leverage one of the many taxonomies or controlled vocabularies out there already for specifying effort. Many people have suggested it'd be awesome to have something like the credits at the end of a film...whether you're Best Boy or Director, you get appropriate credit for your contribution.
It's interesting to consider whether it's Depsy's role to encourage folks to include these kinds of files. So far, we've mostly looked at the project as a way of demonstrating how to leverage whatever data is already there. But your suggestion could be a next step...we'll be listening to hear if there are other calls for something like this, for sure.
@ethanwhite thanks for weighing in, there are some great points here that deserve a longer response...will get back in more detail in next few days...
@ethanwhite Academia (and general usage of the term for that matter) views authorship as a boolean, not a float. You don't have an amount of authorship on a book, or a painting, or a song, or a paper....you either are an author or you ain't. And so you get the social or economic rewards of the "author" role, or you don't. (Blaise Cronin has been doing fascinating work for years on how authorship/acknowledgement works in academia. Esp recently, he's done great stuff looking at changes in authorship, and comparisons with other domains like art)
Upshot is, I think we agree that this yay-or-nay conception of authorship is impoverished: it's exceedingly inviting to political manipulation and bias, it's vague, it's often unfair, and it's largely a legacy of print-based thinking.
But Depsy's approach is to say: we don't make the rules. The people with the money do. This accomplishment of inducing (by some combination of begging, cajoling, cheating, leveraging, working, or whatever) people to call you "Author": the money-distributors and prestige-distributors care about that accomplishment. In fact, in practice they generally care about it more than than they care about your actual intellectual contribution to a given product. Because it suggests you will be able to negotiate/earn the label called "Author" in the future as well, and that's the currency.
So, as long as this keeps being the case, I think it's responsible for depsy to keep caring about whatever thing it is that people do to be called "Author," regardless of how they made that happen.
But of course want to push academia further along, not just empower the current system. Hence the fractional accounting powered by commit records (which is of course not without its own inaccuracies), which it sounds like everyone in this thread likes just fine. Yay for that.
Sounds like maybe we just differ a bit on how prominent that approach should be in they app. And our thinking for now is that it's more powerful, in 2016, to honor the Author role with all its attendant meaning but alongside that start to demonstrate to decision makers and working researchers that there could be a more nuanced, more responsive way, a (somewhat) more objective way to look at authorship as well. Hopefully this gives us a better chance to be part of real conversations, while still making our point loud and clear.
More on your specific points:
I like your idea of doing a bit of both worlds, I think we're on the same page there. We just think the best way to do that is to keep the Author credit totally separate from the commits-based credit system since there is a real impedance mismatch there. But I think it ends up being pretty similar in effect.
Thanks for the thoughts. These are super interesting issues with no One Answer for sure, and you are pointing out a lot of great stuff. And may well be totally right :)
Darn, this looks like a bug where we were not able to associate the PyPi project with its github repo...so we Depsy doesn't actually know anything about the github committer at all. Alas, we have several of these...when the github page is not explicitly listed on PyPi we have to make a guess by searching GitHub for the name/code for that project, and it's not very precise.
Ah, a lot of my thinking as being influenced by looking our our project and khmer, which both suffer from this issue. Now that I'm looking at IPython I see that this is actually working in a way that I think is totally fine 😳. I.e., you've balanced how these two different sources of data contribute to "authorship credit". Sorry for the confusion.
In the last release we both added the GitHub link to the Home Page
metadata field (hopefully that's the right one) and expanded the author list, so we should be all fixed the next time the index gets rebuilt.
Hello,
The khmer software paper lists (nearly) all of our github contributors; this seems to erase the weighted impact measurement based upon commits & files modified?
http://depsy.org/package/python/khmer
I suggest that where there is both a co-authorship & VCS data that the VCS data is used to split up co-authorship.