[joss] software paper comments

https://github.com/openjournals/joss-reviews/issues/3934 Hi, I hope these comments help in improving the paper.

Comments

The paper's title could see a change. It says "PySS3: A new interpretable and simple machine learning model for text classification", but the model is named "SS3" and seems not new. The title of the repository seems more accurate, "A Python package implementing a new simple and interpretable model for text classification", but even then one could drop "new" and use the PyPI package's title, e.g. "PySS3: A Python package implementing the SS3 interpretable text classifier [with interactive/visualization tools for explainable AI]". Just an example to be considered.
I would recommend the authors to highlight in the article the software's aspect of "interactive" (explanation, analysis) and (model, machine learning) "monitoring" as this seems both novel and emerging in discussions lately.
In the end, it would be useful to release a stable version 1.0 of the package (on GitHub, PyPI) and mark that in the paper, e.g. in the Summary section.

Summary

L10. "implements novel machine learning model" - It might not be seen as novel when the model was already published in 2019 and extended in 2020.
L11. mentioning "two useful tools" without describing what the second does seems off

Statement of need This part discusses mainly the need for open-source implementation of the machine learning models. However, as I see it, the significant contributions of the software/paper, distinguishing it from the previous work, are the Live_Test/Evaluation tools allowing for visual explanation and hyperparameter optimization. This could be further underlined.

State of the field The paper lacks a brief discussion on packages in the field of interpretable and explainable machine learning. In that, I suggest the authors reference/compare to the following software related to interactive explainability:

Wexler et al. "The What-If Tool: Interactive Probing of Machine Learning Models" (IEEE TVCG, 2019) https://doi.org/10.1109/TVCG.2019.2934619
Tenney et al. "The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models" (EMNLP, 2020) http://doi.org/10.18653/v1/2020.emnlp-demos.15
Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann. "exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models" (ACL, 2020) https://www.doi.org/10.18653/v1/2020.acl-demos.22
[Ours] "dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python" (JMLR, 2021) https://www.jmlr.org/papers/v22/20-1473.html

Other possibly missing/useful references:

Pedregosa et al. "Scikit-learn: Machine Learning in Python" (JMLR, 2011) https://www.jmlr.org/papers/v12/pedregosa11a.html
Christoph Molnar "Interpretable Machine Learning - A Guide for Making Black Box Models Explainable" (book, 2018) https://christophm.github.io/interpretable-ml-book
Cynthia Rudin "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead" (Nature Machine Intelligence, 2019) https://doi.org/10.1038/s42256-019-0048-x
[Ours] "modelStudio: Interactive Studio with Explanations for ML Predictive Models" (JOSS, 2019) https://doi.org/10.21105/joss.01798

Implementation

L48 github -> GitHub
L54 "such as the one introduced later by the same authors" -> "by us" would be easier to read
L57 missing the citation of scikit-learn

Illustrative examples

In the beginning, it lacks a brief description of the predictive task used for the example (dataset name, positive/negative text classification, etc.).
Also, it could now be updated with the Dataset.load_from_url() function.

Conclusions Again, I have doubts that the machine learning model is "novel", as it has been previously published etc.. It might be misunderstood as "introducing a novel machine learning model".

Hi @hbaniecki!

It's been forever, I'm so sorry! I've moved to Switzerland and started working as a postdoctoral researcher here, and it was a huge change in my life.

I've addressed most of the points you highlighted (commit 61c8419), btw THANKS for your valuable advices. Below I'll address each one of your points:

Comments

The title has been updated following your guidance, now it is "PySS3: A Python package implementing SS3, a simple and interpretable machine learning model for text classification", the word novel has been removed from the paper, as you suggested.
This point will be addressed as part of the "statement of need" following the other points suggested by you there.
I don't think the API is stable enough yet for a 1.0 version, but I will release a new version with the new changes (including loading dataset from url) and reference to that version in the paper, do you think it is OK? of course if you don't agree we can talk about it, no problem! :)

Summary

Both points were fixed.

Statement of need Yes, I totally agree, in the initial version I didn't include it due to space limitation (in fact the paper exceeded the 1000 words limitation). This point and the ones you pointed out in the following item I'll addressed them both in this section. The idea is to talk about the interpretability and explainability a little bit, cite the papers you suggested, and then add the "gap phrase", like "However, little attention has been paid" etc. and focus on the need of interpretable models (not just explainable, but interpretable, i.e. self-explainable). What do you think?

State of the field These references and this discussion will be added above.

Implementation

All three points have been fixed.

Illustrative examples

I've added a brief description of the dataset at the beginning.
Updated the example using the Dataset.load_from_url() function.

Conclusions I've changed the conclusion removing "novel" and adding an extra sentence.

I'm still working on the changes regarding the "Statement of need", I'll let you know as soon as I finish with it. Again, thank you so much for your review work, and apologize for the delay....

sergioburdisso / pyss3

[joss] software paper comments #21