openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
718 stars 38 forks source link

[REVIEW]: Human-Learn #3448

Closed whedon closed 2 years ago

whedon commented 3 years ago

Submitting author: !--author-handle-->@koaning<!--end-author-handle-- (Vincent Warmerdam) Repository: https://github.com/koaning/human-learn/ Branch with paper.md (empty if default branch): Version: 0.3.0 Editor: !--editor-->@jbytecode<!--end-editor-- Reviewers: @desilinguist, @ahurriyetoglu Archive: Pending

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/65f7682a158be074680a4e5ec7d65200"><img src="https://joss.theoj.org/papers/65f7682a158be074680a4e5ec7d65200/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/65f7682a158be074680a4e5ec7d65200/status.svg)](https://joss.theoj.org/papers/65f7682a158be074680a4e5ec7d65200)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@desilinguist & @ahurriyetoglu, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @desilinguist

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @ahurriyetoglu

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @desilinguist, @ahurriyetoglu it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.14 s (522.5 files/s, 97286.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                       2            356            214           1963
Python                          32            406            562           1315
Markdown                        19            425              0           1065
Jupyter Notebook                 6              0           6223            209
HTML                             3             42              0            201
CSS                              2             75            129            181
YAML                             4              8              2            132
TeX                              1              3              0             36
make                             1             10              0             31
JSON                             3              0              0              3
-------------------------------------------------------------------------------
SUM:                            73           1325           7130           5136
-------------------------------------------------------------------------------

Statistical information for the repository '520945e47c16c8fa8ce79b06' was
gathered on 2021/07/02.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Kay Hoogland                     1            37              0            0.12
Vincent                          1         11998              0           39.24
koaning                          1            20              4            0.08
vincent d warmerdam             22          5641          12876           60.56

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Kay Hoogland                 37          100.0          6.8                5.41
vincent d warmerdam        4779           84.7          3.0                7.70
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 3 years ago

Dear @desilinguist and @ahurriyetoglu ,

This is the review issue for the submission. There are 20 checkbox items for each reviewer. Please check the boxes during your review and make comments here. Since the reviewing is progressive, you can comment minor changes during the process so the authors can add/delete/correct changes simultaneously. You can also open new issues in the submitted repository and put links here. Here is the reviewing guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

Please do not hesitate to ask for help in any stage.

Thank you in advance for your contributions to JOSS!

desilinguist commented 3 years ago

This package looks really cool and something that I would likely check out myself for some work that I am doing! Excellent work! The documentation is top-notch. I only have some minor suggestions:

  1. Please clearly list the dependencies in the GitHub README, if possible. This makes it much easier to figure out what's needed than looking at requirements.txt.

  2. I didn't see API documentation for the various classes in the official docs.

  3. It's clear from the code that there are tests but it would if the README described how to run them. And this leads me to ...

  4. There is no section in the README or in the docs about inviting contributions from the community or how someone who might be interested in this would contribute.

And then one final comment that's more selfish, it would be really nice if there was a conda package in addition to one on PyPI but that's totally optional, of course :)

In general, great work and love the clear examples and walkthroughs!

koaning commented 3 years ago

Happy to hear you like it!

  1. Done.
  2. Could you give an example of the classes you found were missing?
  3. Done.
  4. Done.

I might consider conda if there's a clear documented method for me to deploy there. Got any links with tutorials?

desilinguist commented 3 years ago

@koaning I was looking for API documentation that looked something like this. If you are using Sphinx, this can be generated automatically using the autodoc directive. When I am using a library, I find such an API reference quite useful.

As for deploying to conda, here are a couple of pointers that will prove useful:

  1. Using conda-build to create a conda package.

  2. Once you have built the package, you can sign up for a free anaconda.org account to distribute it via a custom channel or you can contribute the package to the community conda-forge channel.

FYI, I have a lot of experience with building conda packages so let me know if you need any additional help.

(PS: not related to this but I really enjoyed the episode of Python Bytes podcast you were on!)

koaning commented 3 years ago

I'm using the mkdocs plugin together with mkdocstrings which should auto-generate the API docs for every class/function that has a docstring. You can find it here.

jbytecode commented 3 years ago

Dear reviewer @ahurriyetoglu

I'm bothering you to ask if everything is okay. Could you please update your status? Please do not hesitate to ask anything if you have any difficulties or problems.

jbytecode commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- None

INVALID DOIs

- None
jbytecode commented 3 years ago

@koaning could you please clarify why the submitted paper has no citations? It is not usual not to have citations as the author(s) are not the whole inventors of the subject at hand. Please add citations and reference entities possibly on 1) the methods that are implemented 2) the technology which is built on 3) the terminology used before.

koaning commented 3 years ago

That's odd. I added a paper.bib to the main repo but it's not being picked up, despite being listed in the frontmatter here. Any clue why it's not being picked up?

jbytecode commented 3 years ago

That's odd. I added a paper.bib to the main repo but it's not being picked up, despite being listed in the frontmatter here. Any clue why it's not being picked up?

it is just because entries in the bibtex file are not cited in the main paper using [@entrylabel]

please follow the example paper in page here.

you can also use the https://whedon.theoj.org/ page to test your pdf.

koaning commented 3 years ago

This should now be fixed. I also added a sentence about LIME and SHAP. They had a role to play in developing this tool so it's fair to mention these tools as well.

jbytecode commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.18653/v1/n16-3020 may be a valid DOI for title: "Why Should I Trust You?": Explaining the Predictions of Any Classifier

INVALID DOIs

- None
jbytecode commented 3 years ago

@koaning please update missing DOI's as whedon suggests.

jbytecode commented 3 years ago

@koaning the correct use of citations is using the brackets like [@citationlabel], not parenthesis like (@citationlabel). Please update the correct form later.

koaning commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18653/v1/n16-3020 is OK

MISSING DOIs

- None

INVALID DOIs

- None
koaning commented 3 years ago

@jbytecode done!

whedon commented 3 years ago

:wave: @desilinguist, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @ahurriyetoglu, please update us on how your review is going (this is an automated reminder).

ahurriyetoglu commented 3 years ago

@koaning Thanks for this cool software. I am interested and have worked on combining rules and machine learning. I started the review by reading the paper from the PDF I found in this thread, generated 6 days ago. My points:

1- The English should be improved. Some examples: "There’s" -> "There is" There are many cases like this. python -> Python "also host a" -> also hosts "(Kluyver et al. (2016))" -> (Kluyver et al., 2016) 2- I do not see the list of the dependencies on the repo. Where is the requirements.txt file? 3- The paper is not is not self contained. A reader would not get an overview of the tool by reading this paper. The functionality should be summarized in the paper. 4- There is not any examples in the paper. Providing links to external tutorials etc. is not enough. 5- State-of-the-art should be explained. A mention of Snorkel could be helpful.

Please let me know if you think I can provide more details about the points above. I am looking forward to reading the next version.

Best wishes,

Ali

koaning commented 3 years ago
  1. Fair! I just ran Grammarly over it. The (Kluyver et al., 2016) example is generated by bibtex though.
  2. The dependencies are listed in the setup.py file. 3-4. Added the code for the function classifiers, this should give a proper overview of what the library is about. Also added the example for the fraud case.
  3. I guess I could mention that project. The goal of this project isn't to be state of the art at all, rather, to just remain incredibly simple to use.
koaning commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

koaning commented 3 years ago

Made a few minor edits.

@whedon generate pdf

jbytecode commented 3 years ago

@whedon generate pdf

@koaning Whatever the message contains, the whedon commands should be on the first line.

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 3 years ago

@koaning the example code added in the paper overfits the page so

koaning commented 3 years ago

The code no longer overflows. I'm not 100% sure what your comment on binary operators is referring to though. I've formatted the code using black which means that all declarations when defining a function are written via param=default. Is this what you're referring to? I tried looking for it but the binary operator ~ is not used anywhere in the paper.

jbytecode commented 3 years ago

@koaning that is okay for now. thank you. I'll post other editorial comments when the review is finished.

ahurriyetoglu commented 3 years ago

Thanks @koaning! A couple of other minor points: 1- You can cite the scikit-learn paper and provide a URL to their homepage. (Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.)

2- python -> Python, it's -> it is, user the define rules -> user to define rules, snokel -> Snorkel 3- The Figures, code snippets, etc. should have some name and a reference in the text. It is not easy to understand what are the code snippets, what they do, why they are there, etc.

Best wishes,

Ali

koaning commented 3 years ago
  1. I already cited the scikit-learn pipeline design in the paper, but I've added a citation in the acknowledgment section using a bibtex citation to the scikit-learn reference paper.
  2. Done.
  3. Done.
koaning commented 3 years ago

Looking at the list on top I figured I should also add installation instructions in the paper. I added it under a new "features" section that picks up right after the statement of need.

jbytecode commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ahurriyetoglu commented 3 years ago

Thanks @koanin. Thanks for the update, it looks better now. Some more points:

1- There's been -> there has been 2- don't -> do not 3- Snokel -> Snorkel 4-"in figure Figure 4" -> in Figure 4 5- Not all figures are mentioned in the text. 6- Code snippets require a numbering and in-text-mentioning as well. 7- The following snippet could be written better in terms of English Language: "Machine learning is a general tool, but it is capable of making bad decisions. Decisions that are very hard to debug too." 8- Please make the "import *" part explicit. In this confusing in this form.

You report the capabilities. Do you have any results and an error analysis of them to report? This could be a use-case. I believe showing performance of this tool will increase its impact a lot!

Best. wishes,

koaning commented 3 years ago
  1. Done
  2. Done
  3. Done
  4. Done
  5. Done
  6. I tried finding a guide on how to format the code snippets but couldn't find anything in this guide. Is there a guide elsewhere?
  7. Done
  8. Done

In terms of capabilities and results/analysis -> all the tools are compatible with scikit-learn so people are free to measure what they'd like via the metrics module that sklearn provides. I have one benchmark that comes to mind, one that is described in the course over here and on our docs over here, but I don't consider the benchmark fair. I'm comparing against the keras blog, which contains a model meant for explaining. Not a model that's supposed to be state of the art.

I worry about listing benchmarks in the paper here mainly because the results may certainly vary depending on the dataset that one is working with.

jbytecode commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ahurriyetoglu commented 3 years ago

@koaning thanks!

I could not understand what do you mean by code formatting. I mean they should have a name and a mention in the text like the tables and figures.

A use case result could provide a sense of what to expect from your tool, although the exact results would change according to context. If you provide your context and your results, this will be enough.

jbytecode commented 3 years ago

@ahurriyetoglu thank you for your comments. maybe you can give a code snippet here to clarify what you mean exactly, or, send a pull request to the main repo that implements the changes for paper.md.