pudu-py / pudu

A Python library for explainability of machine learinng algorithms in an agnostic, deterministic, and simple way.
MIT License
5 stars 1 forks source link

[joss review] review from @aksholokhov #3

Closed aksholokhov closed 12 months ago

aksholokhov commented 1 year ago

(related to this JOSS submission)

First, thank you for submitting your work to JOSS! I personally believe that pudu is a great package addressing a real need in its field and it may be eventually a JOSS paper. However, several issues need to be addressed for this to happen:

Paper

1. Relation to other works 

The package is presented as an "understandable approach that can help the different stakeholders to better understand the AI algorithms" but it does not articulate what makes it more understandable than its competitors. Other libraries are mentioned (SHAP, LIME, GradCAM) as those that "can be difficult to implement, interpret, or understand for technical scientists with little or no data science background." Is it a personal opinion of the authors or there is an indication of a consensus in the field (e.g. survey papers)? If former, then more justification is needed. For example, a table comparing all alternatives side by side with a reasonable quantification of effort to achieve what the authors need (e.g in lines of code, hours to implement, or just present/absent) would greatly improve the paper.

Also, the relationship between pudu and RELIEF remained unclear to me after reading the paper. Does pudu implement RELIEF and a set of wrappers around it that simplify the access to certain functionalities that are frequently needed in the fields of natural sciences? If so, then is it the only package that implements a sensitivity analysis with RELIEF? if yes, then mentioning it would greatly empower the statement of need. If no, then you need to summarize the differences between pudu and other packages that implement RELIEF.

2. Evidence of academic interest

I see that the package has over a thousand of downloads to date, which is great! Are there any works published that used pudu in their workflow? Mentioning them (or at least their total number to date) would be a great quantification of academic interest. Alternatively, mentioning example works that use the same analysis methodology (and thus could have benefited from pudu) would help.

Code

Finally, the examples need a clean-up:

  1. Some examples have import errors. E.g. examples/example_pertubation.py imports perturbation as ptn as a local import, whereas it is located in pudu/perturbation instead, which leads to import errors. It should be replaced with something like from pudu import perturbation as ptn.
  2. Some examples import packages and modules that they don’t use. E.g. examples/example_pertubations.py imports lime which it does not use. Pudu also does not install it which leads to import check errors.
  3. The examples should not assume their working directory, otherwise they fail when a project is opened at its root, for example when the whole project is opened in PyCharm. The examples fail because data/ directory does not exist (it is in examples/data). The working directory should be inferred at run-time to ensure a smooth user experience.
  4. Some examples try to read files that don’t exist. E.g. examples_activation.py tries to read data/0_526_model_(80;79).h5 which does not exist (at least not if the repository is cloned with git clone).
  5. Finally, it would be great to have a paragraph describing (1) the context, (2) the data, (3) what the example aims to illustrate, and (4) what should I expect to see if this example runs correctly, in both the code file and in the documentation page.

Otherwise the package looks great!

enricgrau commented 1 year ago

Thank you @aksholokhov for your thoughtful review. We truly appreciate your comprehensive feedback and insight. After serious consideration, we believe that addressing the concerns you raised has strengthened our manuscript. We have thoroughly reviewed the paper in its entirety with your suggestions and concerns in mind, and we believe we have improved it considerably. The article does appear to be quite different from the original one. We have intentionally made several changes in the sections and deleted many other phrases and paragraphs so we can focus on the main and original goal of the library, which is to aid in the analysis of spectroscopy using ML. We hope that now the article shows the motivation, opportunity, and application clearly and matches your expectations for a JOSS publication.

Here are our responses to each of the points:

Paper

  1. Indeed, we have ample anecdotal evidence to say that such methodologies are difficult to use. Additionally, we are aware of survey studies tackling this and other issues. As such, we rephrase that paragraph in the revised version referencing relevant literature about these problems. Additionally, we recognize the importance of distinguishing pudu from other libraries and methods, thus we offer better context on why pudu can find to be useful for researchers using spectrocopy and ML.

As for the relationship between pudu and the RELIEF method, after extended discussions, we have decided to exclude this reference. In retrospect, we now realize that it only introduces confusion to the goal of the library and offers little value to the overall work. Instead, we focus on the specific method used by pudu and its relevance to the field.

  1. As pudu is relatively new and only has been in a real functional and usable form after version 0.3.0, we do not expect for peers to use it in current publications. However, we believe that with the changes and insights provided in this submission review and a potential publication, the library will get increased visibility and interest within the target audience.

Code

  1. We have fixed these import errors in all examples.
  2. We have revised these imports on the examples and now only include the ones in use. Additionally, we include a examples_requierments.txt file with the additional libraries needed to run all the examples.
  3. We have changed the directory of all data imports to examples/data.
  4. We now include all the files referenced in the examples.
  5. We now include more context for each of the examples following your points, also with the output images.
enricgrau commented 1 year ago

@aksholokhov Please find the new manuscript in the library's repository here in the meantime. We are not able to re-render with the editorialbot from JOSS for some reason, but all the changes that will shown once this is solved are reflected in this preview generated from the new paper.md.

enricgrau commented 1 year ago

Just a friendly reminder @aksholokhov ;) Thanks!

aksholokhov commented 12 months ago

@enricgrau thank you for a comprehensive answer and for addressing my feedback. Indeed, the paper looks quite different now and it has improved greatly. The documentation page describing the examples looks great and the examples run with no issues on my side. I definitely can recommend the paper's acceptance.