Review of submission 96aydemir

blind-reviewer-2 commented 5 years ago

Hi Authors,

After reviewing your artefacts I have some questions regarding your submission.

"Available" Badge

I am unable to find a DOI or link to an archival service such as Zenodo, so at best the submission could be "Reusable". Please upload to such a service and provide a DOI if you wish to receive the requested badge of "Available".

"Manually tagged datasets"

What is "esa-eucl-est.csv"? It is not described in any of the READMEs.
Why are there manually tagged datasets where all of the "RequirementText" column is blank? How can blank requirements be labelled?
Additionally, in the "8combined.csv", there are no requirements listed. If that is designed that way, why is there a column header stating "RequirementText"?

"Notebooks"

I was able to get the notebooks executing in a virtual environment, but none of them appear to be configured to run in their current state.

At the very least, variables such as "DATA_FOLDER" must be changed to point to the right data.
A more confusing finding is that after changing "DATA_FOLDER" to point to the right place, and running "01. ReconstructionOf-KurtanovicMaalej", the Jupyter Notebook does not output the necessary files for the next Jupyter Notebook, "02. Enrich_New_Dataset", to run. Currently, the first Jupyter Notebook only outputs two files ("dataset-full.csv" and "promise-km-100-oq.csv"), whereas the diagram insists many files. Perhaps this is just a configuration issue, but it should be configured, by default, to output what is described in the README.

Overall this is a lot of configuration that could have already been put in place to make replication possible. I understand that you have only applied for the badge level "Available", but your repository appears to be documented such that someone should be able to replicate your work. Please clarify if you would like your work to be replicated, and if so some slight modifications to the repository must be made to make the pipeline produce what is stated in the figure.

timm commented 5 years ago

Dear authors: if you want "available". please add a zip of your work to zenodo (they will automatically issue a DOI). should take about 5 minutes

aydemirfb commented 5 years ago

Thank you for your comments. Here is a partial reply at the moment.

We forgot to include the description of the ESA data set in the readme file, although it is avalable in the RE submission. I have updated the readme.
We have datasets that we are not allowed to share with the public. For those datasets, we cannot share the requirements text, but we share the manual tags and the extracted feature values.
The combined datasets are datasets put in a single file. Since we already provide the natural language requirements in their corresponding file, we do not include them in the combined file. We kept the column header for consistency but I can remove it.
I included the DOI badge to the repository now.

What is the deadline for us to implement the changes to the notebooks? I could not discuss the replication issue with my co-authors yet, but we will respond to these comments soon.

blind-reviewer-2 commented 5 years ago

Hi Authors,

Thank you for the reply. I understand the need to remove sensitive data and I believe posting the labels, even without the requirements, is a good trade off. Third-parties can't verify the labels applied, but assuming they trust the labelling procedure, they can test and verify the algorithms in place.

blind-reviewer-2 commented 5 years ago

Hi Authors,

Given the state of the repository, this discussion, and the upcoming deadline for this process, I feel it necessary to assign a badge to this submission. The deadline is July 12th, which doesn't leave much time for restructuring the repository. I am available to test the updated structure on Friday, should it be uploaded in time.

Here are some summary comments regarding this submission:

The files share consistent names, and the comments in the artefacts are consistently placed. The algorithms are well-documented and show clear effort of refactoring for ease-of-use and understandability. The use of Jupyter Notebooks is becoming much more of a standard within the scientific community due to their use-of-use and readability. The Jupyter Notebooks submitted in this repository are similarly well done and of great contribution for this work. With these comments in mind, I find it clear that this repository is at least "Reusable".

The authors have now included a DOI that links to Zenodo. This resource means that the code and files will be available for an extended period of time for all interested parties to use. With this included, I find it clear that this repository is at least "Available".

My final comments are regarding Replicability. Given the structure of the repository, and the amount of detailed that is present in how the files were documented, I believe that it is possible to replicate their work. However, true replication requires that the original results are obtained in a subsequent study using, in part, artefacts provided by the authors. In the current state, I don't feel it is possible to replicate their work, at least not without significant work on the part of the reviewer or other interested parties. For a more streamline process, I would expect an "Input Data" folder that contains all files you would expect at the beginning of such a pipeline: labelled data, input dictionaries, corpora, etc. Then, following a step-by-step guide of running certain programs (as does exist in this repository with labelled Jupyter files from 01 to 08), the programs should take data from the Input Data folder and transform it into the final results that would be stored in some kind of "Output Data" folder, including tables, figures, etc. This is currently not possible, as parties interested in running the code are met with configuration options that must be done first. For this reason, I don't find that this work is "Replicated".

Given the current state of the repository, I recommend a badge of "Available".

neilernst commented 5 years ago

@blind-reviewer-2 the badge is recommended once these other changes are made?

@aydemirfb - we will be finalizing badges by the weekend, so please make the changes by then.

blind-reviewer-2 commented 5 years ago

The badge of "Available" is recommended now. I believe they have satisfied the requirements.

The badge of "Replicated" could be achieved, but first structural changes to the repository must be made, and then the results must be verified (by myself) by running the scripts in sequence to produce the replicated results.

aydemirfb commented 5 years ago

Thanks. We are aware of the configuration issues, that is why we have originally applied for the "Available" badge. Although we believe that interested readers can replicate the study, we do not provide an automated pipeline.

neilernst commented 5 years ago

@timm please concur on "Available"

aydemirfb commented 5 years ago

Thanks. Are we going to add a badge to the camera ready version of the paper? What should we do next?

neilernst commented 5 years ago

@Timm and I are just finagling the badges, then we will move on to the next step.

researchart / re19