Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @effigies, @mnarayan it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Attempting to check references...
Attempting PDF compilation. Reticulating splines etc...
OK DOIs
- 10.1109/TPAMI.2015.2478471 is OK
- 10.1109/TNNLS.2017.2676101 is OK
- 10.1109/JBHI.2018.2877738 is OK
MISSING DOIs
- None
INVALID DOIs
- None
I've started on this, but may not finish until after the holidays. Apologies if you need a quicker review.
@effigies: No worries and thanks for letting us know! I'll remind you in the new year if we don't hear from you.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@effigies, @mnarayan: Happy new year! I'm checking in to remind you about this review and to see if you have any questions going through the review checklist. Please let me know if there are any issues. Thanks!
Hi @dfm, Just wondering if there's anything we should be doing on our end?
Ah, sorry. I'll try to have a look on Friday.
Thanks @effigies!
@mnarayan: Any updates from your end? Thanks!
LFSpy is a Python reimplementation of LFS, originally written by Narges Armanfard (@armanfn).
Much of the requested review is straightforward box-checking, so I'm restricting these comments to those where I have questions.
General checks
- [ ] Contribution and authorship: Has the submitting author (@kiretd) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
@kiretd wrote the paper and also contributed tests. According to the commit log, the contributors to the repository were:
28 Oliver Cook <cooko@mcmaster.ca>
23 Areeb Khawaja <khawaa1@mcmaster.ca>
16 Kiret Dhindsa <kiretd@gmail.com>
10 Tom Mudway <mudwayt@mcmaster.ca>
Areeb Khawaja appears to be missing from the author list, and Ron Harwood and Ranil Sonnadara have not contributed. Ranil appears to be a PI and Ron a staff member. I'll simply link to the Authorship guidelines and would appreciate an affirmation that Ron and Ranil provided substantial contribution to the effort, and that @khawaa1 has elected not to be an author on this submission.
Documentation
Documentation currently has the form of a README.md. Although there are extensive comments in the code, there is not rendered documentation anywhere.
- [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
This is currently missing
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
This is there, although the installation requirements currently include pytest
. The convention typically is to set installation requirements only to those needed to use the software, and to include tests and optional functionality as extras_require
arguments to the setuptools.setup
function. I will submit a pull request to the main repository to use this. (Authors are not obligated to accept this. Some projects prefer to treat tests as first-class requirements.)
- [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
The example usage is not directly runnable. From the perspective of users of scikit-learn, it is very likely enough, but fully runnable examples would be preferable, IMO. The test_lfs.py module fetches two datasets. One of these could be used to flesh out the example usage.
- [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
No. Examples of standard scikit-learn API documentation strings can be found here.
For these to be readable (as in sklearn.linear_model.LinearRegression), the documentation would need to be rendered.
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Yes, however, tests fail. Opened https://github.com/McMasterRS/LFSpy/issues/3.
- [ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
This is missing.
Software paper
- [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
This currently feels implicit, and explicit comparisons to other methods would be helpful. I think comparisons to other classifiers within scikit-learn would serve equally as well as independent packages, if that's easier.
I suspect the authors are more familiar with the current state of the art in machine learning than I am, but if it's useful: My impression is that random forests (or other decision tree ensembles) will do a similar task, splitting the data into subspaces and selecting the most effective dimensions to distinguish classes.
@mnarayan: Can you please let us know when you'll have a chance to look at this submission? Thanks!
Hi @effigies, thanks for your review!
General checks
- [ ] Contribution and authorship: Has the submitting author (@kiretd) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
@kiretd wrote the paper and also contributed tests. According to the commit log, the contributors to the repository were:
Sorry that the contributions are a bit difficult to track. Most of the work on LFSpy was done prior to putting this up on github, so the github commits don’t reflect everyone’s contributions. In addition, a good chunk of the testing we do is done locally as a group before someone pushes it to git. If you have further questions about who contributed what, please feel free to ask.
Areeb Khawaja appears to be missing from the author list, and Ron Harwood and Ranil Sonnadara have not contributed. Ranil appears to be a PI and Ron a staff member. I'll simply link to the Authorship guidelines and would appreciate an affirmation that Ron and Ranil provided substantial contribution to the effort, and that @khawaa1 has elected not to be an author on this submission.
Both Ron Harwood and Ranil Sonnadara were responsible for direction of the project, including leading the software planning, requirements gathering, and some of the user testing necessary for development of the final project. I’ve also added Areeb to the authorship now that he has an ORCID.
Documentation
Documentation currently has the form of a README.md. Although there are extensive comments in the code, there is not rendered documentation anywhere.
We now have a separate documentation page at https://lfspy.readthedocs.io/en/latest/ It’s not 100% complete yet, though it’s not too far off, and we’ll be keeping this up to date over the lifetime of this project. Please take a look and let us know if there are any pieces missing.
- [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
This is currently missing
We’ve added a Statement of Need to the docs page and the README on github.
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution. This is there, although the installation requirements currently include
pytest
. The convention typically is to set installation requirements only to those needed to use the software, and to include tests and optional functionality asextras_require
arguments to thesetuptools.setup
function. I will submit a pull request to the main repository to use this. (Authors are not obligated to accept this. Some projects prefer to treat tests as first-class requirements.)
This has been updated per your (generous) recommendations on the issues thread on Github.
- [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
The example usage is not directly runnable. From the perspective of users of scikit-learn, it is very likely enough, but fully runnable examples would be preferable, IMO. The test_lfs.py module fetches two datasets. One of these could be used to flesh out the example usage.
Great suggestion! I’ve added an example to the README and docs, and an example.py to the repository.
- [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
No. Examples of standard scikit-learn API documentation strings can be found here.
For these to be readable (as in sklearn.linear_model.LinearRegression), the documentation would need to be rendered.
Another great suggestion! We’ve added this here: https://lfspy.readthedocs.io/en/latest/Functionality.html
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Yes, however, tests fail. Opened McMasterRS/LFSpy#3.
Thanks for finding this. Apparently an update broke the test, but it should be fixed now (see response on the repo).
- [ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
This is missing.
We now have a contribution guidelines page on our docs: https://lfspy.readthedocs.io/en/latest/Community%20Guidelines.html
Software paper
- [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
This currently feels implicit, and explicit comparisons to other methods would be helpful. I think comparisons to other classifiers within scikit-learn would serve equally as well as independent packages, if that's easier.
I suspect the authors are more familiar with the current state of the art in machine learning than I am, but if it's useful: My impression is that random forests (or other decision tree ensembles) will do a similar task, splitting the data into subspaces and selecting the most effective dimensions to distinguish classes.
I’ve added a section on comparison with other standard classifiers, including plots using both sample datasets used in test_lfs.py. You were correct that random forests have some similarities and are a good point of comparison. I’ve also included another standard approach; a linear SVM with univariate feature selection using the F-statistic. These comparisons are in comparisons.py, which has been added to the tests folder.
@whedon generate pdf
PDF failed to compile for issue #1958 with the following error:
ORCID looks to be the wrong length
/app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon.rb:153:in block in check_orcids': Problem with ORCID (00000-0003-4528-9146) for Areeb Khawaja (RuntimeError) from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon.rb:151:in
each'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon.rb:151:in check_orcids' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon.rb:88:in
initialize'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon/processor.rb:36:in new' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/lib/whedon/processor.rb:36:in
set_paper'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/bin/whedon:55:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/command.rb:27:in
run'
from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor.rb:387:in
dispatch'
from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-9847f98e9ec6/bin/whedon:116:in
<top (required)>'
from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in
@whedon generate pdf
PDF failed to compile for issue #1958 with the following error:
[WARNING] Could not convert image '/tmp/tex2pdf.-f4119cc7dbe480a9/20ced9490086bcac201d809020b064159913517e.shtml': Cannot load file Jpeg Invalid marker used PNG Invalid PNG file, signature broken Bitmap Invalid Bitmap magic identifier GIF Invalid Gif signature :
HDR Invalid radiance file signature Tiff Invalid endian tag value TGA Invalid bit depth (104) [WARNING] Could not convert image '/tmp/tex2pdf.-f4119cc7dbe480a9/1ff300c76fa24ed9469b4aea44b91258d01e7f3d.shtml': Cannot load file Jpeg Invalid marker used PNG Invalid PNG file, signature broken Bitmap Invalid Bitmap magic identifier GIF Invalid Gif signature :
HDR Invalid radiance file signature Tiff Invalid endian tag value TGA Invalid bit depth (104) Error producing PDF. ! LaTeX Error: Cannot determine size of graphic in /tmp/tex2pdf.-f4119cc7dbe480 a9/20ced9490086bcac201d809020b064159913517e.shtml (no BoundingBox).
See the LaTeX manual or LaTeX Companion for explanation.
Type H
l.522 ...490086bcac201d809020b064159913517e.shtml}
Looks like we failed to compile the PDF
@whedon generate pdf
@whedon generate pdf
@dfm Hey, sorry about this, but I think I need help including figures in the pdf. Using standard markdown syntax (same as the example given on the submission instructions page), whedon spits out the error a few posts above. I switched to html because it was also the only method that allowed me to resize the images, and now the pdf compiles without images. Hopefully I'm just missing something really obvious?
@kiretd: I just opened a pull request fixing the syntax.
@dfm Thanks so much. It looks like the images just needed to be in the same folder for whedon to find them? Also, is there no way to resize the images with markdown syntax? I assume the latex conversion won't work if the images are included using html syntax.
@whedon generate pdf
Ok, I think everything should be ready for review. I believe we addressed everything from the last review. Thanks again @dfm and @effigies !
Since @mnarayan seems to have lost track of this review, I'm currently looking for another referee and I'll keep you all posted.
@effigies: Thanks for your review so far! When you get a chance, please take a look at the changes that @kiretd has made and continue through the checklist. Thanks again!
@sauln has agreed to take over for mnarayan. Thanks @sauln!
I'll add you as a reviewer and update the checklist. Please let me know if you have any questions or issues as you review.
@whedon remove @mnarayan as reviewer
OK, @mnarayan is no longer a reviewer
@whedon add @sauln as reviewer
OK, @sauln is now a reviewer
👋 Looks like @effigies did a great review. I'm hoping to have my review complete by early next week.
@whedon generate pdf
Just small update to add the figure captions
I'll have to provide a more thorough review near the beginning of next week, but I can supply a few quick notes from pursuing the documentation:
[lfspy.readthedocs.io](https://lfspy.readthedocs.io)
__doc__
magic method, and you could use autodocs to render the functionality page. Hi team! Just checking in here on the status.
@kiretd: I wanted to double check that you got @sauln's recommendations. Let us know if you have any questions!
@effigies: It seems like there might be some remaining issues for you. Can you let us know if there's more that you would like to see beyond this response: https://github.com/openjournals/joss-reviews/issues/1958#issuecomment-583011174
Thanks!
Hi, thanks for the reminder. I'll try to have a look on Friday.
@dfm My concerns have been addressed.
I submitted a minor PR (https://github.com/McMasterRS/LFSpy/pull/6) for some cleanups in the documentation, but that should not be considered blocking.
Hi @dfm Thanks for checking in. Yes, I did get the recommendations by @sauln (thanks for those @sauln). I just moved continents last week, so I haven’t gotten much done on this myself yet. The team has been working on some of the recommendations in the meantime, and hopefully we’ll have an update soon.
@sauln: Thanks for your suggestions! I’ll respond one by one.
- The docs link in the readme doesn't work. You'll need to use
[lfspy.readthedocs.io](https://lfspy.readthedocs.io)
Thanks, looks fixed on our end now.
- It looks like you manually wrote the Functionality page and I see a lot of good comments in the code. If you reformatted the docstrings to comply with an autodocs format, they would render with the
__doc__
magic method, and you could use autodocs to render the functionality page.
This sounds really great, and I think we’ll likely start implementing it for our projects going forward. I’m not sure there’s much value in spending the time to redo our comments and functionality page for LFSpy right now, but we’re taking a look at whether it makes sense to switch it over gradually over the next few months or so.
- It looks like each section in the homepage of the docs is repeated as an individual page. Is this necessary?
Maybe not necessary, but the way we have our hyperlinked table of contents set up is that each section has a link on the sidebar for easier navigation.
- The contribution guide says that contributions should follow PEP8, but the codebase looks like it doesn't follow PEP8 itself. Have you ran a linter or autoformatter over code? I highly recommend black but any will do.
Oliver from our team ran it through Black as you recommended. Pretty cool tool! I’m not an expert on this part, but I believe that it’s not exactly formatted correctly even after using Black. For now, we’ve just removed the PEP8 guideline so there’s no inconsistency.
- Could you provide more examples with larger data sets?
Do you have any particular datasets in mind? Typically sample datasets used for illustration are small so that the tests and examples can be run relatively quickly. In addition, LFS is particularly well suited for p>>n and small n problems, as per our statement of need, so we think our examples (one custom, the other the standard sk-learn sample dataset used in many of their examples) are rather suitable.
- The table of tunable parameters in the docs homepage hasn't rendered correctly.
This looks to be fixed on our end. Can you confirm it looks right on your end too?
- The tests appear to be end-to-end tests. I would like to see more unit tests that focus on individual pieces of logic.
LFSpy is meant to be run as a single function, usually in a pipeline, but at least as a classifier or a feature selector (using the same function). Do you mean we should publish tests of internal functions as would be used for internal validation?
I submitted a minor PR (McMasterRS/LFSpy#6) for some cleanups in the documentation, but that should not be considered blocking.
Thanks @effigies looks good to me, just waiting for one more person to review in case I missed something.
@dfm Just want to check in to see where we are with the process. I know I took some time to respond while we made the updates, so hopefully nothing is waiting on me right now.
@sauln: Do you feel that @kiretd has addressed some of your concerns? Do you have more feedback?
@kiretd: In response to your question to @sauln:
Do you mean we should publish tests of internal functions as would be used for internal validation?
I believe that that was what he was saying. In general, unit tests should test each sub-component of the code, not just the end-to-end result. Unit testing, specifically, is not a requirement for JOSS, but it is recommended best practice.
@dfm and @sauln Thanks, yep, unit testing makes sense. I'm going to look into whether we have the resources to add that in now, or if we'll need to add it in gradually. I'm looking into exactly what would be the best way to do it now.
Submitting author: @kiretd (Kiret Dhindsa) Repository: https://github.com/McMasterRS/LFSpy Version: v1.0.4 Editor: @dfm Reviewers: @effigies, @sauln Archive: 10.5281/zenodo.3813708
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@effigies & @mnarayan, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @effigies
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @sauln
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper