openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
722 stars 38 forks source link

[PRE REVIEW]: `pytest-cases`: A Python package for reproducible AI results (among others) #2465

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @smarie (Sylvain MARIE) Repository: https://github.com/smarie/python-pytest-cases/ Version: 2.0.2 Editor: Pending Reviewer: Pending Managing EiC: Kyle Niemeyer

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @smarie. Currently, there isn't an JOSS editor assigned to your paper.

@smarie if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago

PDF failed to compile for issue #2465 with the following error:

Can't find any papers to compile :-(

kyleniemeyer commented 4 years ago

@whedon query scope

kyleniemeyer commented 4 years ago

@whedon generate pdf from branch joss_paper

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss_paper. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

kyleniemeyer commented 4 years ago

Hi @smarie, thanks for your interest in JOSS.

Before we get started, although you mention some potential uses in machine learning research, based on the paper and documentation it's not clear that this is "research software"—that isn't mean to say that it isn't useful, which it does appear to be! Our definition is:

This definition includes software that: solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering); supports the functioning of research instruments or the execution of research experiments; extracts knowledge from large data sets; offers a mathematical library; or similar.

Can you describe a bit more why this is research software? For example, I don't think pytest itself could be considered research software.

smarie commented 4 years ago

Hi @kyleniemeyer and thanks for the feedback !

When my first PhD student approached the end of his PhD thesis and transferred his code to the lab and our company, I discovered what everyone knows in research: PhD students sometimes create very nice and well designed code, and sometimes... not :) . The issue with this is when we discovered that the results table published to a paper came from such a code containing copied/pasted variants of the evaluation protocol that was biasing the results for some algorithms, or leakage of knowledge from dataset to algorithm making algorithms able to have variants depending on the dataset. Hopefully we were able to fix these things, but I was surprised to hear from colleagues in the academia that this is very common !

I developed pytest-cases to solve this need for our industrial context: being able to create reproducible data science benchmarks so that we can really trust our results. My first two attempts in 2016 and 2018 (internal use only, unpublished) were solely based on creating some python framework "from scratch". But as the project evolved and we applied the framework to more internal machine learning projects, it became very clear that I was reinventing a lot of API/features that were already at the core of pytest. So I asked myself "what is missing to pytest today, that makes it unable to support a data science benchmark usage ?". The answer, unfortunately, was "many things". But over the years I managed to create all of this - the result is pytest-cases. Even if, as you noted, this is a natural general-purpose extension to a general-purpose testing framework, it does make a difference for users wishing to implement reproducible (and readable) data science benchmarks.

This page explains how to use pytest-cases to create a well-designed data science benchmark, where the researcher is natually guided into a design which prevents leakage of information between datasets and algorithm, or between evaluation protocol and algorithms. There is a link on the page to download the associated code: you should see how clear, easy and readable the result is. It makes it also extremely to add challengers or datasets in the benchmark.

When I submitted to JOSS, this question puzzled me but then I read this in the joss scope :

(...) supports the functioning of research instruments or the execution of research experiments (...)

This is exactly it. Reproducible execution of research experiments. Putting it differently, pytest-cases makes it easy and safe for researchers to generate their traditionnal "results table" at the end of the paper.

I am about to leave for 2 weeks so I will not be available to discuss further until then, but I hope that you'll be convinced by the above.

Concerning reviewers, I think that a perfect match would be reviewers accustomed with generating results table in python for machine learning papers. So from the list, starting at the bottom, I see:

If you need additional names, let me know!

Thanks once again for your time

arfon commented 4 years ago

Thanks for providing this additional feedback @smarie. After consulting with the JOSS editorial team, we have concluded that this software is out of scope for JOSS as it does fit within our definition of research software.

arfon commented 4 years ago

@whedon reject

whedon commented 4 years ago

Paper rejected.

smarie commented 4 years ago

Thanks @arfon . Can you please express the reason for rejection a bit more clearly so that I am able to take this into account in my next submission to JOSS ? In particular why is this criterion not met :

"supports the functioning of research instruments or the execution of research experiments"

(this can be an opportunity to clarify the definition so that next time people in my situation do not spend effort to submit, and you do not spend effort to review :) )