labarba / oss_in_rr

Defining the role of open source software in research reproducibility
4 stars 0 forks source link

Reviewer 2 #2

Open labarba opened 2 years ago

labarba commented 2 years ago

Recommendation: Author Should Prepare A Major Revision For A Second Review

Comments: It's an interesting topic and I would be happy to review a revised submission.

Additional Questions:

  1. How relevant is this manuscript to the readers of this periodical? Please explain your rating in the Detailed Comments section.: Relevant

  2. Please summarize what you view as the key point(s) of the manuscript and the importance of the content to the readers of this periodical. If you don't have any comments, please type No Comments.: This article presents a high level analysis and commentary on the role of open-source software as it relates to the problem of research reproducibility. The thesis of the article is that reproducibility is primarily a matter of trust, and that open source software is a means to developing community, and therefore trust.

The article begins with a summary of recent interest and reports on the problem of reproducibility, and observes that these generally conclude with a call for availability of artifacts in some form, not necessarily open source. The continues with a reflection on the underlying reason for reproducibility, which is to develop trust (if appropriate) in experimental results. The recent LIGO results are given as an example of trustworthy results, primarily because of the team's commitment to evaluating alternative hypotheses. Despite a general commitment to sharing of artifacts, the reproducibility of the final results was imperfect.

Next, a discussion of the meaning and history of the "open source" term is given, and its relationship to the notion of "free software." It notes that open source contributes to the transparency of results but is insufficient for reproducibility. Quality software design and documentation are also required. However, this article argues that open source drives quality: developing in the open exposes bug, gains feedback from users, improves documentation, which all contribute to reusability.

Two concerns against open source software are addressed. One concern is that open source enables modifications from anyone, which is clearly not the case. Another is that open source requires more work to clean up and document, and this article argues that such low-quality software processes should not be trusted in the first place. An example of how open source procedures in the author's lab enabled a collaborator to find a bug in a method.

The article then reaches its core argument by describing the process of science as a conversation among collaborators. In summary: "Openness promotes rich networks, lively communities, and fertile connections." In particular, the tools of open source software — pull requests, issue trackers, etc — encourage a particular structure and custom around interactions and encourage the archiving on communications. The conclusion then links reproducibility and trust, observing that prior failures have reduced public trust in scientific activities. The article posits that there is no technical "one click" solution to reproducibility. Rather, that open source collaboration develops relationships between parties, who will then feel a responsibility to produce quality artifacts for each other, and to learn to trust and value artifacts produced by others.

  1. Is the manuscript technically sound? Please explain your answer in the Detailed Comments section.: Not Applicable

  2. What do you see as this manuscript's contribution to the literature in this field?: Reproducibility is a challenging topic because it has multiple interlocking dimensions: technical capabilities, professional expectations, social relationships, and more. I appreciate that this article is striving to sort through some of these connections and refine the meaning and purpose of terms that are familiar. This is a worthy effort, but to this reviewer, the article does not succeed in connecting all of the dots.

The first two thirds or so of the paper follow an agreeable path, from the preliminary discussion of the merits of reproducibility through the definition of open source, and the observation that open source itself does not guarantee reproducibility. I also find myself in agreement with the idea that science is a conversation, and it has also been my experience that fine-grained interactions through open source result in an acceleration of ideas, insight, and built trust between parties.

(Although I might quibble that open source doesn't necessarily lead to stability. The current state of software is that a given product may depend upon thousands of distributed components. If each one of them is a lively conversation with daily updates, it can be a large challenge even to find a set of compatible versions and then compile them in a predictable way. In some ways, a closed-source binary blob is more 'reproducible' in that a single artifact can be saved and reused without the hassle of building.)

Of course, the community has fallen short, particularly in computational techniques where the cost of reproducibility ought to be low. In the absence of reproducible techniques, the reviewer of a scientific claim may fall back on the author's prior work, their academic pedigree, or their friendship (or conflict!) with the author in order to evaluate their trustworthiness. And the result is (or will be) a gradual diminution in the quality of scientific work, in which authors rely upon trust in the author instead of proof of the work itself.

Perhaps this all hinges on the definition of 'trust'. Trust in a person is a future evaluation based on past behavior. (If I trusted you to hold my keys yesterday, perhaps I will trust you to hold my wallet tomorrow.) And while trust is essential to human relationships, it isn't the right foundation on which to build science: our friends can be careless, or mistaken, or misled.

Or perhaps there is argument that open source implies a willingness to be corrected. If an open source product has 10 bugs reported and fixed in the last year, that gives one some confidence that bugs are in fact being found. And such a product (might) be seen as more trustworthy than a closed-source project in which no bugs were corrected. (But what about an open source product with 1000 bugs corrected?)

  1. What do you see as the strongest aspect of this manuscript?:

  2. What do you see as the weakest aspect of this manuscript?:

  3. Does the manuscript contain sufficient and appropriate references? Please elaborate in the Detailed Comments section.: References are sufficient and appropriate

  4. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer in the Detailed Comments section.: Yes

  5. How would you rate the organization of the manuscript? Please elaborate in the Detailed Comments section.: Satisfactory

  6. Is the length of the manuscript appropriate for the topic? Please elaborate in the Detailed Comments section.: Satisfactory

  7. Please rate the readability of this manuscript in the Detailed Comments section.: Easy to read

Please rate the manuscript. Explain your choice in the Detailed Comments section.: Fair

labarba commented 2 years ago

At the core of this critique is the use of the word "trust" in relation to reproducibility. I think the reviewer understood the opposite of what I meant:

doesn't sell me that trust leads to reproducibility. I think it's the other way around.

Yes. The idea is that we engage in reproducible research practices to build trust on the result, and on the cumulative results that form a network of knowledge (a "field"). It seems that the reviewer thought I was saying that "trust leads to reproducibility" because that would be a linear chain: open-source >> trust >> reproducibility. I can see how that reading is possible. I need to figure out how to fix this aspect of my narrative.

labarba commented 2 years ago

A statement about the physical world is objectively true or false. If the author of such a statement has used reproducible scientific techniques, it should be possible to evaluate the truth of that statement independently of one's personal evaluation of the author.

[…]

Trust in a person is a future evaluation based on past behavior. […] And while trust is essential to human relationships, it isn't the right foundation on which to build science


It sounds like the reviewer sees two opposites that appear to be in conflict: scientific assertions should be objectively true, whereas trust is an assessment made about "unseen" facts (that may occur in the future). My proposition is more akin to a polarity: two opposites that need each other.

"Trust is rational" [Solomon & Flores 2001, p.32]. The keys to trust are: action and commitment. Trust is a way of dealing with complexity [S&F p. 9], and trust makes possible a more effective inter-dependency [S&F p.46].

The meaning of trust is subject to the context: whether it be interpersonal relationships, business, or politics, the meaning specializes to that setting. Here we are discussing the conduct of science, and our collective trust in the findings or results—not the researcher, personally—and ultimately trust in the scientific institution. In every context, however, we can distinguish between simple trust, and authentic trust [S&F 2001]. We often think of simple trust first, upon hearing the word: that basic, unthinking trust that is taken for granted, trust by default, absence of suspicion, without scrutiny or reflection. This kind of trust is a poetic illusion, and it rarely exists. "Authentic trust is both reflective and honest with itself and others. […] Authentic trust is not opposed to distrust so much as it is in a continuing dialectic with it" {S&F, p.92].

labarba commented 2 years ago

I found an excellent source of support to my ideas in a Sep. 2020 talk by @rmcelreath: “Science as Amateur Software Development.”

Late in the talk [49:22 time mark], he draws an analogy between software engineering methods of unit testing and continuous integration and empirical science workflows. From expressing a theory as a probabilistic program, using an algorithm to prove that the analysis will be able to identify causal effects, and testing the pipeline with synthetic data sets, and doing all this with standard open source methods, “now you’re ready and we trust your pipeline; it’s time to put real data in it [and] of course it’s important that all of this history be open and available in a public repository so that people trust the analysis.”

[50:22 mark] "the big problem … in common between the endeavor of science and the endeavor of developing open source software to support science is in integrating work from different experts and doing it in a responsible way, and doing it transparently, in public so that people who come after us can can have some trust in what we've done and in our work and also when mistakes are discovered—and mistakes are always discovered—they can go back and find the source of the mistake and correct it and and learn from that…"

labarba commented 2 years ago

I have checked off the two items to address in the referee's critique, which I think I satisfied with the two recent commits.