Hello! New around here, love the journal, think this system is great, but have been thinking about something... searched in previous issues and didn't find something addressing this, so forgive me if this has already been discussed. I'm making this suggestion in a sort of good-faith questioning kind of way, not a hard recommend or demand or anything, so interested to see what y'all think :)

I appreciate the lightweight nature of JOSS papers, and that scientific software papers don't need to (and probably shouldn't) adopt every practice of traditional scientific papers, but I think we are missing one kind of citation in the review requirements. Scientific papers typically require that all specific, falsifiable statements of fact be accompanied by a citation to another paper or substantiating data within the paper. I think the natural analogy for scientific software papers would be to require a link to a (passing) test for specific factual claims made in the manuscript.

Current Status

The Review criteria require:

Functionality

Reviewers are expected to install the software they are reviewing and to verify the core functionality of the software.

Tests

Authors are strongly encouraged to include an automated test suite covering the core functionality of their software.

Good: An automated test suite hooked up to an external service such as Travis-CI or similar
OK: Documented manual steps that can be followed to objectively check the expected functionality of the software (e.g., a sample input file to assert behavior)
Bad (not acceptable): No way for you, the reviewer, to objectively assess whether the software works

and the Review Checklist specifies:

Functionality: Have the functional claims of the software been confirmed?

Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Room for Growth

"core functionality," used in multiple places, is understandably vague. One way of isolating some of what the core functionality of the package might be the specific claims that the authors make about the functionality. "functional claims" in the comes close to this, but that too needs a bit more specificity.

the checklist item for tests currently is just a confirmation that there are indeed some tests, but does not specify eg. coverage or some minimal criteria they need to satisfy. I see how it's tricky to articulate some minimum, because testing is subtle, programmatic metrics like lines of code covered don't really reflect the quality of tests, and I certainly don't think every line of code should need to be tested to make it through review. The review criteria for tests is a bit more descriptive, but it too has a bit of ambiguity in 'core functionality' and 'expected functionality.'

One way of improving here might be to link them -- a) specify that 'core functionality' is constituted by the specific things that the authors claim the software can do, and b) set the minimum standard for tests be that they test all of the claims specified as core functionality.

Suggestion

Add sections to the description of the JOSS paper, review criteria, and review checklist that require a reference to a passing test for specific, falsifiable claims about the functionality of the software made in the manuscript.

I imagine several variations and caveats would be needed:

there is a gray area about what constitutes a 'specific, falsifiable claim' -- saying that a package 'makes it easy to do x' doesn't really have a corresponding test, and sometimes claims of functionality are so generic that they apply to the whole package (numpy does "numerical operations," but what test proves that?). I think providing some examples and giving reviewers and authors room to discuss this & editors discretion to adjudicate would be fine.
tests passing or not is of course dependent on the version of the software, so unless this is already the case it might also be good to require that a paper specifically be tied to a version/commit hash, or else have references to tests also specify the version of the code that passes them.
some things are impractical to test in an automated way, like software that controls some hardware that the reviewers probably don't have access to. In those cases i'm split between just not requiring tests for that vs. requiring the submission of data/figures that show they have done the test locally -- when i wrote a paper for a package I did something similar, eg. write an example experiment and published benchmarks as results.
is there some distinction to be made between the docs and the manuscript? the docs arguably matter more since that's what users will experience 99% of the time, so does this also apply to claims made in the narrative docs?
This could also be a specific section of the paper --- the author lists in bullet points the specific claims they identify as the 'core functionality' of their package with links to the tests for them, rather than required through the manuscript as in traditional papers.
In the case that writing tests is out of reach for some authors, they could perhaps substitute a specific example that isn't necessarily a test with assertions. This is already suggested in the sections covering documentation/examples, but might be one way to make those more specific as well?

openjournals / joss

Require tests for falsifiable claims in manuscript/docs? #932

Current Status

Functionality

Tests

Room for Growth

Suggestion