openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
722 stars 38 forks source link

[REVIEW]: pyCeterisParibus: explaining Machine Learning models with Ceteris Paribus Profiles in Python #1389

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @kmichael08 (Michał Kuźba) Repository: https://github.com/ModelOriented/pyCeterisParibus Version: v0.5.2 Editor: @katyhuff Reviewer: @janfreyberg, @justinshenk Archive: 10.5281/zenodo.2667756

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92"><img src="http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92/status.svg)](http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@janfreyberg & @justinshenk, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @katyhuff know.

Please try and complete your review in the next two weeks

Review checklist for @janfreyberg

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @justinshenk

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @janfreyberg, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

JustinShenk commented 5 years ago

Overview: pyCeterisParibus is a library for explaining machine learning models with ceteris paribus profiles. These are useful for adding to visual story telling and supporting model interpretability. The idea is great, the implementation is clean, and I may use it in some of my projects. Some minor improvements are suggested below.

Installation: Installed without issues via pip and local copy of the source code.

Functionality: I was able to run the example on my Mac, but was not able to load the plot, due to an issue with how file paths are handled on Macs. I have opened a pull request at https://github.com/ModelOriented/pyCeterisParibus/pull/24 fixing this issue on my machine. After this is accepted or otherwise addressed I will consider it completed. I opened an issue (https://github.com/ModelOriented/pyCeterisParibus/issues/23) regarding the scrollbars obscuring the data. This could be fixed by adding additional padding to the bottom of the frame.

Performance: No measure of performance is given, but the model loaded fast on the Titanic dataset.

Documentation: The explanation for how the model works could be improved. For example, in the paper the author's write "For this purpose, methods for sampling and selecting neighbouring observations are implemented along with the Gower's distance [@gower] function. A more detailed description might be found in the package documentation." I was not able to find description of Gower's distance in the linked to readthedocs. Adding details of how the model works would be helpful for people who are not familiar with Gower's distance or how it applies to machine learning models.

Software Paper: The software paper has a few minor typos or questionable stylistic choices for an academic paper:

Example Usage: The notebooks and example scripts without problems.

References: Every reference mentioned in the paper is documented as BibTex entries.

kmichael08 commented 5 years ago

@justinshenk Great thanks for all these valuable remarks! I merged your pull request. Also, I applied your comments referred to the paper and put the Gower's distance description in the documentation. I'll solve the scrollbar problem (ModelOriented/pyCeterisParibus#23) as fast as possible.

kmichael08 commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

JustinShenk commented 5 years ago

@kmichael08 Thanks for the quick response and edits.

@katyhuff My review (https://github.com/openjournals/joss-reviews/issues/1389#issuecomment-484024162) is now complete.

janfreyberg commented 5 years ago

Installation

Installing the package in a fresh docker alpine image leads to dependencies being installed that I don't think need to be, for example sphinx, m2r, codecov, etc.

There are a few ways around this so I haven't made a PR but I think you can do the following:

Split requirements into actual requirements (what's needed to run the package), documentation requirements, and test requirements (e.g. pytest). You can add these as additional requirements using the extras_require key in setup.py, or simply install them from txt files wherever you need them.

Additionally, as far as I can tell tensorflow is never imported and so should be removed from the requirements.

I would even go so far as to say XGBoost and sklearn should not be in the requirements, even though you use it in the paper and documentation, becuase it's not essential to the functioning of the package. Instead, you could make a note that people should install them to run the examples.

Otherwise, installation works great.

Functionality / Performance

This all worked great for me.

Documentation

I think the docs can be improved:

But that's just a recommendation.

Paper

The paper is very good. Only point: the R package CeterisParibus is not included in the references.

katyhuff commented 5 years ago

Thanks for the speedy reviews, @janfreyberg @justinshenk . And, thanks for responding quickly to the suggestions @kmichael08 .

@kmichael08 , there are a few items in @janfreyberg's review that will need to be handled before we should move forward with acceptance:

The rest of the comments from @janfreyberg would certainly clean things up, but aren't explicitly need for our JOSS requirements, so I'll just recommend that you consider the recommendation from @janfreyberg : "Split requirements into actual requirements (what's needed to run the package), documentation requirements, and test requirements (e.g. pytest). You can add these as additional requirements using the extras_require key in setup.py, or simply install them from txt files wherever you need them.... I would even go so far as to say XGBoost and sklearn should not be in the requirements, even though you use it in the paper and documentation, becuase it's not essential to the functioning of the package. Instead, you could make a note that people should install them to run the examples."

I have looked over the package and have found it installs pretty easily. Once you've seen this message @kmichael08 and implemented the two above changes, please ping me and we'll move on with next steps.

kmichael08 commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

kmichael08 commented 5 years ago

Thanks a lot @janfreyberg and @katyhuff!

So, as far as this two sounds ok for you, we can move on. I'll definitely enhance the docs soon and use sphinx-jupyter. Thanks for that!

katyhuff commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

katyhuff commented 5 years ago

@whedon check references

whedon commented 5 years ago
Attempting to check references...
whedon commented 5 years ago

OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None
katyhuff commented 5 years ago

@kmichael08 I'm going through some of the final checks (first up, the bibliography):

arfon commented 5 years ago

Weird. If I change the bib file field to:

howpublished = {\url{https://www.openrightsgroup.org/blog/2018/machine-learning-and-the-right-to-explanation-in-gdpr}},

Then it seems to compile OK. Changing the flag to breaklinks=true doesn't seem to fix anything.

kmichael08 commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

kmichael08 commented 5 years ago

@katyhuff I updated the version of the paper to the one you mentioned (it's ok) and added DOI. As for the url breaking I changed it into the workaround, that @arfon applied above. Let me know if that's ok

katyhuff commented 5 years ago

Thanks @arfon for the workaround and thank you to @justinshenk @janfreyberg for your excellent reviews.

Thank you @kmichael08 for a strong submission and for engaging actively in the review process! I have looked over the paper, double checked all the DOI links, and have conducted a high level review of the code itself. Everything looks ship-shape to me.

@kmichael08 At this point, please double check the paper yourself, if you want to update your code version (e.g. change v5.0 to some minor release representing today's version) review any lingering details in your code/readme/etc., and then make an archive of the reviewed software in Zenodo/figshare/other service. Please be sure that the DOI metadata (title, authors, etc.) matches this JOSS submission. Once that's complete, please update this thread with the DOI of the archive, and I'll move forward with accepting the submission! Until then, now is your moment for final touchups!

kmichael08 commented 5 years ago

@katyhuff I updated repository to the v0.5.2 and archived it in Zenodo. DOI: 10.5281/zenodo.2667756

kyleniemeyer commented 5 years ago

@whedon set 10.5281/zenodo.2667756 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2667756 is the archive.

kyleniemeyer commented 5 years ago

@whedon accept

whedon commented 5 years ago
Attempting dry run of processing paper acceptance...
whedon commented 5 years ago

PDF failed to compile for issue #1389 with the following error:

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 13 0 13 0 0 158 0 --:--:-- --:--:-- --:--:-- 160 pandoc: 10.21105.joss.01389.crossref.xml: openFile: does not exist (No such file or directory) Looks like we failed to compile the Crossref XML

whedon commented 5 years ago

OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.1214/aos/1013203451 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None
arfon commented 5 years ago

@whedon accept

whedon commented 5 years ago
Attempting dry run of processing paper acceptance...
whedon commented 5 years ago

OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.1214/aos/1013203451 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 5 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/662

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/662, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
arfon commented 5 years ago

@whedon accept deposit=true

whedon commented 5 years ago
Doing it live! Attempting automated processing of paper acceptance...
whedon commented 5 years ago

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/663
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01389
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

    Any issues? notify your editorial technical team...

katyhuff commented 5 years ago

@kyleniemeyer @arfon Thanks for jumping forward with the submission. That said, I didn't get a chance to execute the whedon set version command in time to beat you to that accept function!

Usually, that's part of my task list at this stage -- do we need to fix and re-accept? That is, the submission was v0.5, but, at my request, the author updated the version when creating the archive release, to reflect the version that includes joss-related changes. The new version, to be incorporated in the JOSS publication, is v0.5.2, so I would usually have run whedon set version before whedon accept. Can you confirm whether this is going to be an issue?

arfon commented 5 years ago

Usually, that's part of my task list at this stage -- do we need to fix and re-accept? That is, the submission was v0.5, but, at my request, the author updated the version when creating the archive release, to reflect the version that includes joss-related changes. The new version, to be incorporated in the JOSS publication, is v0.5.2, so I would usually have run whedon set version before whedon accept. Can you confirm whether this is going to be an issue?

Sorry my/our bad - looks like we got ahead of ourselves here. The version isn't actually captured in the paper so please go ahead and update that here.

arfon commented 5 years ago

Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01389

Also, please note, Crossref is still having some issues so this DOI doesn't resolve yet.

katyhuff commented 5 years ago

@whedon set v0.5.2 as version

whedon commented 5 years ago

OK. v0.5.2 is the version.

katyhuff commented 5 years ago

So (@arfon @kyleniemeyer ) do we just run accept again?

arfon commented 5 years ago

So (@arfon @kyleniemeyer ) do we just run accept again?

There's no need to because the version isn't captured anywhere other than here. The archive DOI is correct right? (This is linked to in the paper)

katyhuff commented 5 years ago

fancy .