[REVIEW]: PyUoI: The Union of Intersections Framework in Python

whedon commented 5 years ago

Submitting author: @pssachdeva (Pratik Sachdeva) Repository: https://github.com/BouchardLab/pyuoi Version: 1.0.0 Editor: @terrytangyuan Reviewer: @puolival, @yzhao062 Archive: 10.5281/zenodo.3563147

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/3238472b1ec5f377836f7ee9b8bea0c9"><img src="https://joss.theoj.org/papers/3238472b1ec5f377836f7ee9b8bea0c9/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/3238472b1ec5f377836f7ee9b8bea0c9/status.svg)](https://joss.theoj.org/papers/3238472b1ec5f377836f7ee9b8bea0c9)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@puolival & @yzhao062, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @terrytangyuan know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @puolival

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@pssachdeva) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @yzhao062

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@pssachdeva) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @puolival, @yzhao062 it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

terrytangyuan commented 5 years ago

Hi @puolival & @yzhao062, just a reminder that this will be the place to conduct the official review on. Please go ahead and review when you get a chance and ping the submitting author @pssachdeva or me if you have any questions!

yzhao062 commented 5 years ago

@pssachdeva (@terrytangyuan for some reason I could mention Pratik in this thread).

It is a good package and I just did some initial tests and crosses some items from the list. Currently, I have three major questions:

License: not necessarily an OSI approved one but the term is consistent with the open-source standard. Or could you clarify if it is a modified version of one of the OSI licenses?
I could run examples successfully (plot_uoi_lasso.py), but I am not sure why the figure is not showing up (I will try to figure it out on my end). Another note is it will be helpful to have an example folder on GitHub so please could find it easily. It took me some time to navigate.
I tried to run the unit test and doc test, but failed due to the import errors. I would not worry too much about the unit test as CI takes care of that. However, could you provide the result for doc test. You could run "pytest --doctest-modules pyuoi/" and address any potential errors there (if there is any).

Other than that, the installation has instruction and it is easy (via pip). CI is enabled and codecov is in place. I could also find the contribution guide (https://pyuoi.readthedocs.io/en/latest/contributing.html).

I look forward to your confirmation on the license, example place/instruction, and the result of the doc test.

I will go through the rest review processes at the same time.

Cheers!

pssachdeva commented 5 years ago

Hi @yzhao062,

Thanks for your comments.

License: not necessarily an OSI approved one but the term is consistent with the open-source standard. Or could you clarify if it is a modified version of one of the OSI licenses?

We used the Lawrence Berkeley National Labs BSD variant license, whose short identifier is "BSD-3-Clause-LBNL". The license is the same as BSD-3-Clause, but with an additional default contribution licensing clause. See here for more details.

I could run examples successfully (plot_uoi_lasso.py), but I am not sure why the figure is not showing up (I will try to figure it out on my end).

This might be happening because we didn't have a plt.show() at the end of the file. Our latest pull request added that to the end of the example scripts, so hopefully it should work now.

Another note is it will be helpful to have an example folder on GitHub so please could find it easily. It took me some time to navigate.

Thanks for pointing this out. We reorganized our Sphinx Gallery directories so that the scripts are located in a top-level examples directory.

I tried to run the unit test and doc test, but failed due to the import errors.

Could you tell us how you installed the package and which import errors you received? Depending on the install method, we might need to update our requirements.txt or requirements-dev.txt to make sure all the necessary packages are installed.

However, could you provide the result for doc test. You could run "pytest --doctest-modules pyuoi/" and address any potential errors there (if there is any).

We don't have any code in our docstrings that needs to be tested, so running that command returns no errors with no tests run. However, this point did make us realize that our continuous integration had not been testing whether the documentation was being compiled correctly. Thus, our newest pull request should add that functionality to our pipeline.

puolival commented 5 years ago

Checklist

The software is well written and documented. I did encounter some issues, but I think they should be easy to solve, and seem to be related to first uses outside the original development environment. In particular, the installation failed for me, but I was still able to work around it by manual work and testing several different environments. I am quite sure that all issues I faced were due to conflicting version numbers. Please make it visible on the requirement lists and the installation instructions page which version numbers of each used package are supported, and which Python version the user is supposed to have.

To give an example, in one of my environments, I received errors like this: In [1]: import pyuoi File "/usr/local/.../pyuoi/linear_model/base.py", line 19 class AbstractUoILinearModel(SparseCoefMixin, metaclass=_abc.ABCMeta): ^ SyntaxError: invalid syntax which I first thought to be due to some character encoding issue, given that the line's syntax looks correct. However, if I am not mistaken, it is just that the Python version (2.7) of the particular environment is wrong.

Regarding the license, I was not able to figure out whether it is OSI approved or not. However, this issue seems to have been already handled with the other reviewer.

Manuscript

The manuscript is overall well-written, concise, and clear. The language is good and sufficient background is provided to put the work in context. However, I would suggest expanding the section on features. In particular, please consider explaining the example algorithm to the reader, as well as providing a description of how complete the software is at the moment. For example, do the included models cover a large part of typical use-cases? How were they selected? It would be very helpful if this somehow allowed the user to evaluate if her specific analysis/use-case is supported or not. I also have a number of questions, which I think do not necessitate more than minor changes to the manuscript:

Are the any similar competing frameworks?
It is claimed that the union of intersections framework is naturally scalable. However, it is not readily apparent why this is so. Please clarify this point.
It would help the reader if it was specified whether the vectors are row or column vectors. This information needs to be now inferred from the equations.
What happens if the true beta in equations (1) and (2) is not sufficiently sparse? How robust is the method to slight violations of its assumptions?
In the applications section, it would be nice if something was stated about the obtained results. Did PyUoI give better results compared to other approaches when it was used? Also, was there any particular reason to apply the method to neuroscience and genomics data?
It is stated that PyUoI is agnostic to the optimization method (solver). Why is this so?
I would also suggest adding citations to Pedregosa et al. (2011) and Dalcin et al. (2005). Please see the pages https://scikit-learn.org/stable/about.html#citing-scikit-learn and https://mpi4py.readthedocs.io/en/stable/citing.html for the key references.

@terrytangyuan @pssachdeva

pssachdeva commented 5 years ago

Hi @puolival,

Thanks for your comments.

Checklist

Please make it visible on the requirement lists and the installation instructions page which version numbers of each used package are supported, and which Python version the user is supposed to have.

As you stated, the Python version may be the source of the errors you encountered. We have opted to not support Python 2.7 since it is losing support in 2020. However, we do not explicitly state this in either the documentation or the README. We have made it explicit in the documentation and README that PyUoI is tested and supported for Python 3.5+.

Additionally, we have clarified the minimum version numbers for the required packages in PyUoI, both in the requirements.txt and the documentation.

Regarding the license, I was not able to figure out whether it is OSI approved or not. However, this issue seems to have been already handled.

We used the BSD-3-Clause-LBNL license, which is OSI approved. See here for the description on the OSI webpage.

Manuscript

In particular, please consider explaining the example algorithm to the reader

Could you elaborate on this? Our manuscript includes the pseudocode for a detailed description of the algorithm, and the Background section provides a more general description of the framework (with key steps highlighted in the pseudocode for clarity).

as well as providing a description of how complete the software is at the moment. For example, do the included models cover a large part of typical use-cases? How were they selected? It would be very helpful if this somehow allowed the user to evaluate if her specific analysis/use-case is supported or not.

We have added a paragraph in the Features section clarifying our choice of generalized linear models. We chose these because they are the most commonly used models in a variety of disciplines, particularly neuroscience and bioinformatics which are our active fields of research.

Are the any similar competing frameworks?

There are competing approaches to obtaining sparse parameter estimates to specific types of models, such as the linear regression model or poisson regression model (e.g. coordinate descent in glmnet). UoI was compared to a battery of other approaches for linear regression in the cited NeurIPS paper.

It is claimed that the union of intersections framework is naturally scalable. However, it is not readily apparent why this is so. Please clarify this point.

Thank you for pointing this out. We have added the following text in the Summary to better clarify the scalability:

The UoI framework operates by fitting many models across resamples of the dataset and across a set of regularization parameters. Thus, since these fits can be performed in parallel, the UoI framework is naturally scalable. PyUoI is equipped with mpi4py functionality to parallelize model fitting on large datasets [@dalcin2005].

It would help the reader if it was specified whether the vectors are row or column vectors. This information needs to be now inferred from the equations.

We have clarified the shapes of the vectors in the Background section to emphasize that they are column vectors. Additionally, we have bolded all vectors to add further clarity.

What happens if the true beta in equations (1) and (2) is not sufficiently sparse? How robust is the method to slight violations of its assumptions?

We note that the formulation of the Lasso problem makes no constraints on the exact level of sparsity. Indeed, determining the optimal lambda value is often part of the model estimation procedure. Thus, if the true parameters are more dense, the UoI procedure will obtain more dense parameter estimates.

The original Union of Intersections paper examined its performance across a variety of underlying distributions of model parameters, degrees of sparsity, and noise levels, finding superior performance compared to a battery of other algorithms. See the appendix of the paper for more details.

In the applications section, it would be nice if something was stated about the obtained results. Did PyUoI give better results compared to other approaches when it was used?

The Union of Intersections framework is capable of obtaining sparse and predictive fits for parametric models. We address the capabilities of UoI in the second paragraph of the Summary, where we state that:

Models inferred through the UoI framework are characterized by their usage of fewer parameters with little or no loss in predictive accuracy, and reduced bias relative to benchmark approaches.

Also, was there any particular reason to apply the method to neuroscience and genomics data?

We and our collaborators conduct research in these fields. Neuroscience and genomics datasets are often high-dimensional with structure that is well-suited to the enforcement of a sparsity prior.

It is stated that PyUoI is agnostic to the optimization method (solver). Why is this so?

This language may have been too strong. The Union of Intersections framework is agnostic to the solver. For example, the UoI Lasso approach requires obtaining (many) solutions to Lasso optimization problems (on resampled datasets), which could be obtained via coordinate descent, gradient descent, orthant-wise LBFGS, etc. The type of solver is not really important insofar as it provides good estimates.

With this in mind, we have structured PyUoI to be extendable to other solvers. For example, UoI Lasso can use a coordinate descent solver (obtained from scikit-learn) or the pycasso solver (cited). The user could conceivably extend the UoI framework to use their own choice solver. Thus, PyUoI is not agnostic to the solver without the additional work of extending the software. We have added text in the Features section to clarify this.

I would also suggest adding citations to Pedregosa et al. (2011) and Dalcin et al. (2005). Please see the pages https://scikit-learn.org/stable/about.html#citing-scikit-learn and https://mpi4py.readthedocs.io/en/stable/citing.html for the key references.

Thank you for bringing this up. We have added these references to the paper.

pssachdeva commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

yzhao062 commented 5 years ago

I just read the manuscript and it looks good to me. The author also addresses my previous concern in the post above. As a result, I am happy to recommend the paper for acceptance at JOSS :) @pssachdeva @terrytangyuan

terrytangyuan commented 4 years ago

@whedon check references

whedon commented 4 years ago

Attempting to check references...

whedon commented 4 years ago


OK DOIs

- None

MISSING DOIs

- https://doi.org/10.1109/icmla.2017.0-152 may be missing for title: UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data
- https://doi.org/10.1007/978-1-4612-0919-5_38 may be missing for title: Information theory and an extension of the maximum likelihood principle

INVALID DOIs

- None

terrytangyuan commented 4 years ago

@pssachdeva Could you fix the missing DOIs mentioned above?

puolival commented 4 years ago

@pssachdeva About the algorithm: I was thinking about providing a more detailed explanation of the algorithm's main steps for the less mathematically inclined reader, geared towards providing a conceptual understanding.

@pssachdeva @terrytangyuan I have no further comments to the manuscript or the software. I think the last point above can be satisfactorily addressed with minor changes, and doesn't require a re-reading.

pssachdeva commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

pssachdeva commented 4 years ago

@whedon check references

whedon commented 4 years ago

Attempting to check references...

whedon commented 4 years ago


OK DOIs

- 10.1109/ICMLA.2017.0-152 is OK
- 10.1007/BF02607055 is OK
- 10.1073/pnas.1900654116 is OK
- 10.1007/978-1-4612-1694-0_15 is OK
- 10.1214/aos/1176344136 is OK

MISSING DOIs

- None

INVALID DOIs

- None

terrytangyuan commented 4 years ago

Thanks everyone!

@pssachdeva At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors I can then move forward with accepting the submission.

pssachdeva commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

pssachdeva commented 4 years ago

@terrytangyuan Here is the DOI for the package, which we uploaded on Zenodo: 10.5281/zenodo.3563146.

Here's a link to the Zenodo page.

pssachdeva commented 4 years ago

Edited the above to the "all versions" DOI.

danielskatz commented 4 years ago

JOSS wants the DOI associated with the specific version that is being discussed in this paper, not the “concept” or “all versions” DOI

pssachdeva commented 4 years ago

@terrytangyuan @danielskatz OK, here's the DOI for the version discussed in the paper: 10.5281/zenodo.3563147

terrytangyuan commented 4 years ago

@whedon set 1.0.0 as version

whedon commented 4 years ago

OK. 1.0.0 is the version.

terrytangyuan commented 4 years ago

@whedon set 10.5281/zenodo.3563147 as archive

whedon commented 4 years ago

OK. 10.5281/zenodo.3563147 is the archive.

terrytangyuan commented 4 years ago

@whedon accept

whedon commented 4 years ago

Attempting dry run of processing paper acceptance...

whedon commented 4 years ago


OK DOIs

- 10.1109/ICMLA.2017.0-152 is OK
- 10.1007/BF02607055 is OK
- 10.1073/pnas.1900654116 is OK
- 10.1007/978-1-4612-1694-0_15 is OK
- 10.1214/aos/1176344136 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 4 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1156

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1156, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

terrytangyuan commented 4 years ago

@danielskatz The paper looks good to me now. Could you take it from here?

danielskatz commented 4 years ago

Yes, but in the future, please mention @openjournals/jose-eics in such requests

danielskatz commented 4 years ago

@pssachdeva - please merge https://github.com/BouchardLab/pyuoi/pull/193 and check to make sure all the cases are correct in the references.

pssachdeva commented 4 years ago

OK, just merged.

pssachdeva commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

PDF failed to compile for issue #1799 with the following error:

Error reading bibliography ./paper.bib (line 11, column 19): unexpected "2" expecting white space, "#", "," or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF

pssachdeva commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

pssachdeva commented 4 years ago

@danielskatz The casing in the references looks good now.

openjournals / joss-reviews