si2-urssi / berkeley_workshop

Repo for the April 10-12 workshop to be held in Berkeley, CA
http://urssi.us/workshops/berkeley/
14 stars 17 forks source link

First draft of position paper; may update. #29

Closed demitri closed 6 years ago

danielskatz commented 6 years ago

we would much rather have you publish the paper on figshare or zenodo or arxiv and then send us a link to the published paper, if at all possible.

demitri commented 6 years ago

Happy to do so. Given the discussions so far, I think I'd like to modify and expand the position paper. I will resubmit later. Thanks!

katrinleinweber commented 6 years ago

May we still use this for discussion of the draft?

Thanks for summarising the Astropy situation. I would like to suggest a slightly different line of argumentation for the "tragedy of the commons" part you describe. Seeing that "commons" can also be defined as including commonly accepted, democratically agreed-upon rules for suistainably using a common good, the situation you describe is more like anarchy: everybody taking, yet nobody giving.

One could therefore argue, that a "common" needs to be achieved as an early step, rather than being an existing "tragic" situation.

Dr-G commented 6 years ago

Kelle Cruz gave a talk recently on this same subject: https://www.slideshare.net/KelleCruz/collaborations-in-the-extreme-the-rise-of-open-code-development-in-the-scientific-community

demitri commented 6 years ago

@katrinleinweber Yes, of course! Apologies for my late reply; I was in the midst of proposal wiring.

Re: The "tragedy of the commons". That was a very deliberately chosen phrase; I am referring to a well-studied social science/philosophical concept. I recommend the Wikipedia page for a good overview; this is a quote from that page:

The tragedy of the commons is a term used in social science to describe a situation in a shared-resource system where individual users acting independently according to their own self-interest behave contrary to the common good of all users by depleting or spoiling that resource through their collective action.

I think Astropy is pretty close to a textbook example. What I wrote in the position paper was a summary of a longer, more detailed description of a paper I wrote called "The Astropy Problem", which you can read here: https://arxiv.org/abs/1610.03159. I had 160 astronomers sign it in support. I strongly suspect that other fields will find something to resonate with in that paper.

Kelle Cruz does regularly talk about Astropy and is involved with the project, but she doesn't touch on the issues that I mentioned in this paper and more specifically in the "Astropy Problem".

I'm still working on an expansion of the position paper I submitted for the workshop; I will try to get that out ASAP.

Dr-G commented 6 years ago

I'm not sure if you read the slides/watched Kelle's talk, but it is very much about the tragedy of the commons. A tweet from her talk of some data that I personally collected went viral; it shows the degree to which there is a huge increase in users without the concomitant increase in contributors.

On Mon, May 21, 2018 at 11:27 PM, Demitri Muna notifications@github.com wrote:

@katrinleinweber https://github.com/katrinleinweber Yes, of course! Apologies for my late reply; I was in the midst of proposal wiring.

Re: The "tragedy of the commons". That was a very deliberately chosen phrase; I am referring to a well-studied social science/philosophical concept. I recommend the Wikipedia page for a good overview https://en.wikipedia.org/wiki/Tragedy_of_the_commons; this is a quote from that page:

The tragedy of the commons is a term used in social science to describe a situation in a shared-resource system where individual users acting independently according to their own self-interest behave contrary to the common good of all users by depleting or spoiling that resource through their collective action.

I think Astropy is pretty close to a textbook example. What I wrote in the position paper was a summary of a longer, more detailed description of a paper I wrote called "The Astropy Problem", which you can read here: https://arxiv.org/abs/1610.03159. I had 160 astronomers sign it in support. I strongly suspect that other fields will find something to resonate with in that paper.

Kelle Cruz does regularly talk about Astropy and is involved with the project, but she doesn't touch on the issues that I mentioned in this paper and more specifically in the "Astropy Problem".

I'm still working on an expansion of the position paper I submitted for the workshop; I will try to get that out ASAP.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/si2-urssi/berkeley_workshop/pull/29#issuecomment-390858821, or mute the thread https://github.com/notifications/unsubscribe-auth/AHnFsj7FCFv0DG9GsZ2MXNDMpOc5uf1nks5t05ObgaJpZM4TOdvy .

-- Dr. Gina Helfrich Communications Director, NumFOCUS https://www.numfocus.org?utm_source=email&utm_medium=Gina_signature gina@numfocus.org 512-222-5449

danielskatz commented 6 years ago

As I commented on twitter, I agree with the point to some extent but also disagree a bit. How many maintainers should there be? I don't think that this should scale with number of users (too many cooks...) but I really don't know what is correct.

demitri commented 6 years ago

@Dr-G I did read the slides. I do not have anything to say about the number of maintainers of the project(s). I think the number should be small and that it is being extremely well handled without any input from me! What I am suggesting about Astropy specifically (I can't speak to Numpy/SciPy development) is that a) the developers with the odd exception are not being paid for their work and b) there is a gaping lack of career-track positions for this work. It would be one thing if it were a "community" or hobby project, but it's not: millions if not billions of dollar programs depend on the software (Hubble + JWST alone, plus dozens of others). I'm happy to let someone else decide how many maintainers the project has. I'm saying they do not have a budget.

But let's take Numpy as an example, from the chart. There are 6 "core" maintainers, and they are important obviously. Presumably they are full time paid to do this. (Not a correct assumption for Astropy's core maintainers.) But their work, as the diagram shows, is to manage the contributions of 564 people. How many of those are paid for their work? How many should be? (Entirely rhetorical; I don't have an answer besides "not zero".) It is not a tragedy of the commons if all of the core maintainers were paid full time and x% of the contributors were compensated in some way (money, %FTE).

I argue in my paper that Astropy could not have been developed anywhere near as successfully at any of the big astro centers. There is a failing there that grad students and postdocs addressed extremely successfully, and I don't want the model to be changed. But to expect that level of development on what is now critical software and not pay for it? That's unethical.

And it's certainly true that the output is constrained but the number of contributors. There is a laundry list of software in astro that is decades old (no exaggeration) that needs to be replaced.

bangerth commented 6 years ago

https://github.com/dealii/dealii has 11 maintainers, which is (I think) about the correct number though we could do with 3 or 4 more.

The question for projects is not only the user base, but the developer base and how many patches get submitted. In deal.II, we merge 5-10 patches per day and reviewing this many probably costs each maintainer 30 minutes per day on average. Add to this their own software development, and we can't grow the number of merged patches substantially with the manpower we have.

Matplotlib has fewer maintainers, but merges order-of-magnitude the same number of commits: https://github.com/matplotlib/matplotlib/graphs/contributors https://github.com/dealii/dealii/graphs/contributors But you can see where they struggle: They have 250+ open pull requests and 1000+ open issues, many opened months or years ago. You can't work this backlog with that few people.

bangerth commented 6 years ago

I read the astropy paper with interest, and it's true that other communities have similar problems (e.g., the exceedingly widely used software in bioinformatics for DNA assembly, structure inference, etc; and the molecular sciences for molecular dynamics at various length and time scales).

What sets astronomy apart from most other disciplines is that it is rich in the sense that it has a number of projects that are exceedingly well funded and that could easily shoulder the cost of this software development. I mean, if the Sloan survey is funded at $40M a year, then the astronomy community can rightfully expect that it pays a few people to develop the software people use to make use of SDSS data. If SDSS leadership does not do this, then that's a community failure: the rest of the community must simply hold leadership accountable for this and demand that that's written into the next contract extension. The astronomy community is small enough to do this, and because astropy is really of no substantial use to a significant constituency outside of astronomy itself, I cannot see how any other player than the astronomy community itself can achieve this.

Most other communities do not have these mega-projects that could do this, and so do not have this lever to require the big players to do big development. The only other community I can think of is the particle physics community. There, the big centers (at CERN, at the DoE labs) have a substantial number of staff whose job it is to do software development. But they also have a model where every grad student is expected to do a certain number of hours a week in "community service" such as development software that will then be put into central projects. I only know this anecdotally, and so others should be able to speak with more authority on it, but I think the idea of "expected community service" is an interesting one where a whole community has recognized a problem and solved it collectively.

katrinleinweber commented 6 years ago

[…] a whole community has recognized a problem and solved it collectively.

That's where a state of "commons" is achieved :-)

Re: The "tragedy of the commons". That was a very deliberately chosen phrase; I am referring to a well-studied social science/philosophical concept.

I know the concept and I agree with the assessment of the situation. However, that article also has a criticism section. That's where I derived my suggestion for the "slightly different line of argumentation" from and am arguing that the textbook example is twisted.

In the end, it comes down to which strategies help improve the contribution situation and software quality within a project. I suggest that propagating that phrase does not help because of negative connotations. How likely do you think it is, that a user reading about it will be thinking "Do I want to contribute to a tragedy?" and deciding "No."?