shaunagm / free-culture-in-an-expensive-world

Repository for https://shaunagm.gitbooks.io/free-culture-in-an-expensive-world/content/
7 stars 1 forks source link

Annotated Bibliography Item #1: Roads and Bridges by Nadia Eghbal #5

Open shaunagm opened 7 years ago

shaunagm commented 7 years ago

This issue contains by notes for item 1 of the annotated bibliography. When they're a bit more coherent, I'll add them to the bibliography. Please feel very free to add your own comments.

A note: "Eghbal, N." is @nayafia - Nadia, please don't feel any pressure to respond/comment, but I thought I'd tag you to let you know I'd written this. :)

Citation: Eghbal, N. (2016). Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure. Retrieved from the Ford Foundation website: http://www.fordfoundation.org/library/reports-and-studies/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure

General thoughts

This is a great overview of the resource problems facing the open source community, and covers a lot of ground. Being an overview, it doesn't go into too much detail regarding any one problem. Its greatest strength is the number of individual projects and developers whose stories and frustrations are given voice by the report, many of which were new to me.

I'd recommend this to anyone who hasn't thought about open source sustainability in much depth but who is willing to take the time to read a 140-page report. For people with more knowledge, it's still worth reading, especially for the stories and quotes, but it doesn't have quite the level of value as it has for people just beginning to think about these topics.

Quotes & notes

Venture Capital

Eghbal has a background in venture capital, and so it's not surprising that the report contains insights into how VCs view open source. See:

Steve Klabnik’s thesis, in other words, is that venture capital firms who invest in open source infrastructure promote these platforms as a “loss leader,” even when there is no direct business model or profitability to be had, because it grows the entire ecosystem. The more resources GitHub has, the more open source thrives. The more open source thrives, the more startups thrive. If nothing else, venture capital’s interest in open source, especially given the lack of clear financial return, validates the critical role open source plays in the broader startup ecosystem. (p. 50)

Also:

Venture capital, as discussed, has a personal stake in the future of digital infrastructure. [...] However, “infrastructure,” from a venture capitalist’s mindset, is not limited to open source but rather focused on platformsource but rather focused on platformsource but rather focused on platforms that help other people create. Therefore, investments in GitHub or npm, which are platforms that help distribute open source code, make sense, but so do investments like Slack, a workplace collaboration platform which developers can use to create other “command”-driven apps. (To this point, venture capitalists formed a $80M “Slack fund” to support developer projects that use Slack.) Even if venture capitalists appreciate the underlying mechanics of infrastructure, they are limited by their asset class: a VC could not make investments into a project that didn’t have a business model. (p. 121)

As far as I can tell, there were no VCs quoted in this report. While I'm skeptical of VCs in general, I wish they'd been included.

Government Investment

While Eghbal details a variety of funding sources, including institutions like large corporations and software foundations, little time is spent on government investment in infrastructure. Given the huge role that government plays in physical infrastructure - something highlighted in the report - this speaks to a real failure on government's part. This is not to say that governments do not invest in open source at all - there are a number of projects like Tor and of course the internet and web themselves that have been publicly funded - but their impact is not as great as it could (arguably should) be. In some ways, governments are making things harder:

Qualifying as a 501(c)(3) can be challenging for these projects, due to the lack of awareness about open source technology and tendency to see open source as a non-charitable activity. In 2013, a controversy revealed that the IRS had internally identified a list of groups applying for tax-exempt status that would require further scrutiny; “open source” was one of these. Unfortunately, these constraints make it difficult for projects to institutionalize.

For instance, Russell Keith-Magee, who until recently was president of the Django Software Foundation, explained that the foundation cannot directly “fund” software development of Django, without the risk of losing its 501(c)(3) status. Instead, they “support” its develop- ment through community activities. (p. 111)

Organizational Burdens

The burden on open source maintainers is of course one of the through lines of this report, but the report is inconsistent in drawing out what exactly these burdens are. Relatively early on, the author writes "Opening up a project to the public can mean less work for the company, which is essentially crowdsourcing improvements" (p. 47) but later on acknowledges that open source projects are often more work:

GitHub made it easy to create and contribute to new projects. This was a blessing for the open source ecosystem, because projects develop more rapidly, but it can be a curse to any one project maintainer, with more people easily reporting issues and requesting new features, without actually contributing back themselves. These shallow interactions only create more work for the maintainers, who are expected to address a growing volume of requests. (p. 74)

You could say that Eghbal is drawing a distinction between filing bugs and feature requests, and "contributing", but I'm not sure how useful a line that is to draw. (Nor am I sure this is Eghbal's actual intention.) Many bug reports and feature requests are incredibly valuable to maintainers, and many actual contributions cause more problems than they alleviate if they are poorly done, not a good match for the project direction, or simply too complex.

I think it's very important for us to explore what kinds of contributions (and contribution workflows, tools, and methodologies) are most beneficial to projects, but I don't think it will come down to "issues opened" vs "pull requests made".

Distributed management

Several of the stories in the report stress the value of distributed management:

“We had really big fights back in 2002 or so where I was dropping patches left and right, and things really weren't working. It was very painful for everybody, and very much for me, too. Nobody really likes criticism, and there was a lot of flaming going around—and because it wasn't a strictly technical problem, you couldn't point to a patch and say, ‘hey, look, that patch improves timings by 15%’ or anything like that: there was no technical solution. The solution ended up being better tools, and a work flow [sic] that allowed much more distributed Management.” Quote from Linus Torvalds (p. 62)

Linux is a unique story, because between the creation of the Linux Foundation in 2007 and now there has been such a huge change in how the project is funded. Linux is arguably the best funded and most corporate of the open source projects out there. It seems like that the switch to more distributed management was necessary to allow so many different corporations to make large investments of money and employee time.

There's also the fascinating story of Node.js:

Node.js is a JavaScript framework, developed in 2009 by Ryan Dahl and several other developers working at Joyent, a private software company. It became extremely popular, but began to suffer governance constraints due to Joyent’s patronage, whom some felt could not fully represent an enthusiastic and fast-growing Node.js community.

In 2014, a group of Node.js contributors threatened to fork the project. Joyent tried to address governance issues by forming an Advisory Board for the project, but the project was forked anyway, under the name io.js. In February 2015, an intent to form a 501(c)(6) organization was announced which would remove Node.js from Joyent’s stewardship. The Node.js and io.js communities voted to work together under this new entity, called the Node.js Foundation. The Node.js Foundation, structured under the advisorship of the Linux Foundation, has a number of corporate sponsors who financially contribute to its budget, including IBM, Microsoft, and PayPal. (p. 113)

It seems that Node.js has also pursued a distributed management approach:

The Node.js contribution policy, which is made available for other Node projects to adopt, emphasizes growing the number of contributors and empowering them to make their own decisions, instead of treating maintainers as the final approving authority. Their contribution policy details how to submit and accept pull requests and how to log bugs and other issues. The Node.js maintainers found that adopting better policies helped them manage their workload and grow their community into a healthier, active project. (p. 131)

Eghbal cites this article: Healthy Open Source: A walkthrough of the Node.js Foundation’s base contribution policy., which I immediately added to my "to read" list.

That said, the Node.js community is mentioned as an unhealthy community in an unrelated section of the report:

Drew Hamlett, who calls himself a “recovering magpie developer,” wrote a popular post in January 2016 called “The Sad State of Web Development,” about how web development has changed, referring specifically to the Node.js ecosystem:

"The people who have stayed in the Node community have undoubtedly created the most over engineered eco system [sic] that has ever appeared. No one can create a library that does anything. Every project that creeps up is even more ambitious than the next....No one will build something that actually does anything. I just don’t understand. The only thing I can think, is people are just constantly re writing Node.js apps over and over." (p. 73-74)

Of course, the Node.js core could be beautifully maintained in a distributed fashion while the Node.js user community is splintering into chaos, but there's a tension here that I'd like to know more about.

High vs Low Quality Contributors

The report makes a useful distinction between what Eghbal calls "keystone contributors" and newcomers:

In conservation biology, a “keystone species” is a species of animal with a disproportionately large effect on its environment relative to its abundance. Similarly, a “keystone contributor” might be a developer who contributes to multiple critical projects, is singlehandedly responsible for a critical project, or is generally perceived to be influential and trustworthy. Keystone contributors are critical advocates; empowering them with the resources they need could help improve the system as a whole. (p. 129)

She quotes Hynek Schlawack, who puts the issue bluntly:

“What frustrates me most is that we have an all-time high of Python developers and an all-time low on high quality contributions.[...] As soon as pivotal developers like Armin Ronacher slow down their churn, the whole community feels it immediately. The moment Paul Kehrer stops working on PyCA we’re screwed. If Hawkowl stops porting, Twisted will never be on Python 3 and git.

So we’re bleeding due to people who cause more work than they provide. [...] Right now everyone is benefitting from what has been built but due to lack of funding and contributions it’s deteriorating. I find that worrying, because Python might be super popular right now but once the consequences hit us, the opportunists will leave as fast as they arrived.” (p. 76)

Eghbal endorses this view directly:

Today, the hypergrowth of coding literacy means many inexperienced developers are flooding the market. These newer developers borrow shared code to write what they need, but they are less capable of making substantial contributions back to those projects. Many are also accustomed to thinking of themselves as “users” of open source projects, rather than members of a community. Because open source tools are more standardized and easy to use, it’s much easier these days for someone to pop into a GitHub forum and make a rude comment or demanding request, which burdens and exasperates project maintainers.” (p. 73)

I'm sure there is a correlation between being new to open source and coding, and being more burdensome on open source maintainers, but I don't think newness or inexperience is the cause. The truth is that we have very little advice to give maintainers on how to manage technical, organizational and interpersonal complexity, or to give contributors about how to make contributions most productively. Newness is correlated with burdensomeness in large part because we're not teaching newcomers effectively, and we're not teaching newcomers effectively because we don't understand what's going on ourselves.

That said, I very much agree with Eghbal's proposal to identify "keystone contributors". We might not be able to articulate precisely why they're working so effectively, and how to help others reach that level of success, but we can and should support them in the meantime.

Final note on this topic: inexperienced newcomers are not the only burden on maintainers. Daniel Roy Greenfield calls out corporations -- even open source flagship corporations -- as being especially burdensome:

I personally get regular demands for unpaid work (Discussions about payment for work always stall) by healthy high profit companies large and small for [my projects]. If I don't respond in a timely fashion, if I'm not willing to accept a crappy pull request, I/we get labeled a jerk. There is nothing like having core Python/PyPA maintainers working for Redhat [sic] demanding unpaid work while criticizing what they consider your project's shortcomings to ruin your day and diminish your belief in open source.” (p. 82)

Metrics

Towards the end of the report, Eghbal focuses in on the need for better metrics:

With better metrics, we could describe the economic impact of digital infrastructure, identify critical projects that are lacking support, and understand dependencies between projects and people. Right now, it is impossible to say who is using an open source project unless that person or company discloses their usage. Our information about which projects need better support is mostly anecdotal. (p. 129)

The only statistics available about GitHub repositories are the number of people who have starred (similar to a “like” or “favorite”), watched (meaning they receive updates about the project) or forked a project. (p. 130)

I absolutely agree on the importance of better quantifying and qualifying the open source community and its areas of particular need. There is some low-hanging fruit here, though. I've got a very small project that assesses whether a given repository has an active open source community: Should You Contribute? and Eghbal cites a study that uses the Github API. There's actually a lot of data available. The problem, to me, is definition -- I don't think we know what questions we want to ask, and without well-formed questions even the best data is useless.

Misc Thoughts

A private benefactor may want special privileges that threaten the neutrality of a project. (For example, for security-related projects, privileged disclosure of vulnerabilities—paying for special knowledge about security vulnerabilities instead of exposing those vulnerabilities to the public—is a controversial request.) (p. 63)

I had no idea this was a thing and I feel a tiny bit worse about the world now.

Another helpful way of thinking about infrastructure that can be charged for is that if there is an immediate risk of downtime, it probably has a business model. In other words, a server can have unexpected interruptions in service, the way electricity might unexpectedly shut off, but a programming language does not “break” or have downtime in the same way, because it is a system of information. (p. 90)

Eghbal cites Sam Gerstenzang for this distinction (specifically this tweet but this article talks at more length). This is a really interesting perspective and I'm grateful to Gerstenzang for articulating it and Eghbal for introducing me to it.

In 2013, a security flaw in RubyGems.org was discovered, but went unfixed for several days, because RubyGems.org was maintained entirely by volunteers. The volunteers planned to address it that weekend, but in the meantime someone else discovered the flaw and hacked the RubyGems.org server. Following the hack, the servers had to be rebuilt from scratch. Several volunteers took time off work, and some even took personal vacation days, in order to get RubyGems.org up and running again as soon as possible. Because RubyGems.org is a critical piece of Ruby infrastructure, the security issue affected many developers and companies in turn. (p. 81)

I'd like to know more about this incident, which I'd not heard about before.

Matt Asay, a journalist who focuses on open source, noted that Red Hat uses a unique set of patents and licensing to protect its enterprise market. (p. 92-93)

Eghbal links to this source but it doesn't really explain Red Hat's patent/licensing situation, and I am deeply curious.

“If I just continue to use Google products, and I stay within their walls, I get this great benefit. [But] living in a mixed world is almost impossible, and very painful, and everything has bugs, and no one of these companies really wants to support you. And so we’re in this weird world where, and if you look at a city-state world, one of the big problems was interstate commerce, if you have a tariff because you’re trying to export something from Austin and sell it to Dallas, that is not a good economy. You’re going to suffer from a lack of innovation and a lack of idea sharing. And that’s where we are right now.” (p. 118)

Quote from this wired article. Are open source projects that exist at the intersection of other projects particularly vulnerable to a lack of support? It might be a risk factor.

There is plenty of work that can be done to make projects easier to contribute to, including migrating them to newer workflows, cleaning up code, closing unattended pull requests, and setting clear policies for contribution.” (p. 131)

Agreed, and yet... if there are even greater bottlenecks, specifically complexity management issues, getting rid of these other bottenecks just increases pressure on the tightest one.

However, many projects are trapped somewhere in the middle: large enough to require significant maintenance, but not quite so large that corporations are clamoring to offer support. These are the stories that go unnoticed and untold. From both sides, these maintainers are told they are the problem: Small project maintainers think mid-sized maintainers should just learn to cope, and large project maintainers think if the project were “good enough,” institutional support would have already come to them.” (p. 63)

I agree that size is likely to be a big risk factor for under-resourced projects. That said, I'm sure it's not the only risk factor. Can we characterize who and what is struggling most in our current system?

nayafia commented 7 years ago

@shaunagm thanks so much for this thoughtful and detailed analysis! I really enjoyed reading through this and think you raise important points. And this book overall seems really interesting...I'm a fan of anthropology and human-centric writing :)

One small note, about VCs not being quoted in the paper- Mark Suster from Upfront is quoted about the rise of open source (actually one of my favorite quotes!) but not about VCs investing directly into open source. After talking to a lot of folks, I'm a bit skeptical about the future of VC + open source and didn't think it was the most important perspective to include, given all the other material competing for attention.