Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @puolival, @yzhao062 it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Hi @puolival & @yzhao062, just a reminder that this will be the place to conduct the official review on. Please go ahead and review when you get a chance and ping the submitting author @pssachdeva or me if you have any questions!
@pssachdeva (@terrytangyuan for some reason I could mention Pratik in this thread).
It is a good package and I just did some initial tests and crosses some items from the list. Currently, I have three major questions:
License: not necessarily an OSI approved one but the term is consistent with the open-source standard. Or could you clarify if it is a modified version of one of the OSI licenses?
I could run examples successfully (plot_uoi_lasso.py), but I am not sure why the figure is not showing up (I will try to figure it out on my end). Another note is it will be helpful to have an example folder on GitHub so please could find it easily. It took me some time to navigate.
I tried to run the unit test and doc test, but failed due to the import errors. I would not worry too much about the unit test as CI takes care of that. However, could you provide the result for doc test. You could run "pytest --doctest-modules pyuoi/" and address any potential errors there (if there is any).
Other than that, the installation has instruction and it is easy (via pip). CI is enabled and codecov is in place. I could also find the contribution guide (https://pyuoi.readthedocs.io/en/latest/contributing.html).
I look forward to your confirmation on the license, example place/instruction, and the result of the doc test.
I will go through the rest review processes at the same time.
Cheers!
Hi @yzhao062,
Thanks for your comments.
License: not necessarily an OSI approved one but the term is consistent with the open-source standard. Or could you clarify if it is a modified version of one of the OSI licenses?
We used the Lawrence Berkeley National Labs BSD variant license, whose short identifier is "BSD-3-Clause-LBNL". The license is the same as BSD-3-Clause, but with an additional default contribution licensing clause. See here for more details.
I could run examples successfully (plot_uoi_lasso.py), but I am not sure why the figure is not showing up (I will try to figure it out on my end).
This might be happening because we didn't have a plt.show()
at the end of the file. Our latest pull request added that to the end of the example scripts, so hopefully it should work now.
Another note is it will be helpful to have an example folder on GitHub so please could find it easily. It took me some time to navigate.
Thanks for pointing this out. We reorganized our Sphinx Gallery directories so that the scripts are located in a top-level examples
directory.
I tried to run the unit test and doc test, but failed due to the import errors.
Could you tell us how you installed the package and which import errors you received? Depending on the install method, we might need to update our requirements.txt
or requirements-dev.txt
to make sure all the necessary packages are installed.
However, could you provide the result for doc test. You could run "pytest --doctest-modules pyuoi/" and address any potential errors there (if there is any).
We don't have any code in our docstrings that needs to be tested, so running that command returns no errors with no tests run. However, this point did make us realize that our continuous integration had not been testing whether the documentation was being compiled correctly. Thus, our newest pull request should add that functionality to our pipeline.
Checklist
The software is well written and documented. I did encounter some issues, but I think they should be easy to solve, and seem to be related to first uses outside the original development environment. In particular, the installation failed for me, but I was still able to work around it by manual work and testing several different environments. I am quite sure that all issues I faced were due to conflicting version numbers. Please make it visible on the requirement lists and the installation instructions page which version numbers of each used package are supported, and which Python version the user is supposed to have.
To give an example, in one of my environments, I received errors like this:
In [1]: import pyuoi File "/usr/local/.../pyuoi/linear_model/base.py", line 19 class AbstractUoILinearModel(SparseCoefMixin, metaclass=_abc.ABCMeta): ^ SyntaxError: invalid syntax
which I first thought to be due to some character encoding issue, given that the line's syntax looks correct. However, if I am not mistaken, it is just that the Python version (2.7) of the particular environment is wrong.
Regarding the license, I was not able to figure out whether it is OSI approved or not. However, this issue seems to have been already handled with the other reviewer.
Manuscript
The manuscript is overall well-written, concise, and clear. The language is good and sufficient background is provided to put the work in context. However, I would suggest expanding the section on features. In particular, please consider explaining the example algorithm to the reader, as well as providing a description of how complete the software is at the moment. For example, do the included models cover a large part of typical use-cases? How were they selected? It would be very helpful if this somehow allowed the user to evaluate if her specific analysis/use-case is supported or not. I also have a number of questions, which I think do not necessitate more than minor changes to the manuscript:
@terrytangyuan @pssachdeva
Hi @puolival,
Thanks for your comments.
Checklist
Please make it visible on the requirement lists and the installation instructions page which version numbers of each used package are supported, and which Python version the user is supposed to have.
As you stated, the Python version may be the source of the errors you encountered. We have opted to not support Python 2.7 since it is losing support in 2020. However, we do not explicitly state this in either the documentation or the README. We have made it explicit in the documentation and README that PyUoI is tested and supported for Python 3.5+.
Additionally, we have clarified the minimum version numbers for the required packages in PyUoI, both in the requirements.txt
and the documentation.
Regarding the license, I was not able to figure out whether it is OSI approved or not. However, this issue seems to have been already handled.
We used the BSD-3-Clause-LBNL license, which is OSI approved. See here for the description on the OSI webpage.
Manuscript
In particular, please consider explaining the example algorithm to the reader
Could you elaborate on this? Our manuscript includes the pseudocode for a detailed description of the algorithm, and the Background section provides a more general description of the framework (with key steps highlighted in the pseudocode for clarity).
as well as providing a description of how complete the software is at the moment. For example, do the included models cover a large part of typical use-cases? How were they selected? It would be very helpful if this somehow allowed the user to evaluate if her specific analysis/use-case is supported or not.
We have added a paragraph in the Features section clarifying our choice of generalized linear models. We chose these because they are the most commonly used models in a variety of disciplines, particularly neuroscience and bioinformatics which are our active fields of research.
- Are the any similar competing frameworks?
There are competing approaches to obtaining sparse parameter estimates to specific types of models, such as the linear regression model or poisson regression model (e.g. coordinate descent in glmnet). UoI was compared to a battery of other approaches for linear regression in the cited NeurIPS paper.
- It is claimed that the union of intersections framework is naturally scalable. However, it is not readily apparent why this is so. Please clarify this point.
Thank you for pointing this out. We have added the following text in the Summary to better clarify the scalability:
The UoI framework operates by fitting many models across resamples of the dataset and across a set of regularization parameters. Thus, since these fits can be performed in parallel, the UoI framework is naturally scalable.
PyUoI
is equipped withmpi4py
functionality to parallelize model fitting on large datasets [@dalcin2005].
- It would help the reader if it was specified whether the vectors are row or column vectors. This information needs to be now inferred from the equations.
We have clarified the shapes of the vectors in the Background section to emphasize that they are column vectors. Additionally, we have bolded all vectors to add further clarity.
- What happens if the true beta in equations (1) and (2) is not sufficiently sparse? How robust is the method to slight violations of its assumptions?
We note that the formulation of the Lasso problem makes no constraints on the exact level of sparsity. Indeed, determining the optimal lambda value is often part of the model estimation procedure. Thus, if the true parameters are more dense, the UoI procedure will obtain more dense parameter estimates.
The original Union of Intersections paper examined its performance across a variety of underlying distributions of model parameters, degrees of sparsity, and noise levels, finding superior performance compared to a battery of other algorithms. See the appendix of the paper for more details.
- In the applications section, it would be nice if something was stated about the obtained results. Did PyUoI give better results compared to other approaches when it was used?
The Union of Intersections framework is capable of obtaining sparse and predictive fits for parametric models. We address the capabilities of UoI in the second paragraph of the Summary, where we state that:
Models inferred through the UoI framework are characterized by their usage of fewer parameters with little or no loss in predictive accuracy, and reduced bias relative to benchmark approaches.
- Also, was there any particular reason to apply the method to neuroscience and genomics data?
We and our collaborators conduct research in these fields. Neuroscience and genomics datasets are often high-dimensional with structure that is well-suited to the enforcement of a sparsity prior.
- It is stated that PyUoI is agnostic to the optimization method (solver). Why is this so?
This language may have been too strong. The Union of Intersections framework is agnostic to the solver. For example, the UoI Lasso approach requires obtaining (many) solutions to Lasso optimization problems (on resampled datasets), which could be obtained via coordinate descent, gradient descent, orthant-wise LBFGS, etc. The type of solver is not really important insofar as it provides good estimates.
With this in mind, we have structured PyUoI to be extendable to other solvers. For example, UoI Lasso can use a coordinate descent solver (obtained from scikit-learn
) or the pycasso solver (cited). The user could conceivably extend the UoI framework to use their own choice solver. Thus, PyUoI is not agnostic to the solver without the additional work of extending the software. We have added text in the Features section to clarify this.
- I would also suggest adding citations to Pedregosa et al. (2011) and Dalcin et al. (2005). Please see the pages https://scikit-learn.org/stable/about.html#citing-scikit-learn and https://mpi4py.readthedocs.io/en/stable/citing.html for the key references.
Thank you for bringing this up. We have added these references to the paper.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
I just read the manuscript and it looks good to me. The author also addresses my previous concern in the post above. As a result, I am happy to recommend the paper for acceptance at JOSS :) @pssachdeva @terrytangyuan
@whedon check references
Attempting to check references...
OK DOIs
- None
MISSING DOIs
- https://doi.org/10.1109/icmla.2017.0-152 may be missing for title: UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data
- https://doi.org/10.1007/978-1-4612-0919-5_38 may be missing for title: Information theory and an extension of the maximum likelihood principle
INVALID DOIs
- None
@pssachdeva Could you fix the missing DOIs mentioned above?
@pssachdeva About the algorithm: I was thinking about providing a more detailed explanation of the algorithm's main steps for the less mathematically inclined reader, geared towards providing a conceptual understanding.
@pssachdeva @terrytangyuan I have no further comments to the manuscript or the software. I think the last point above can be satisfactorily addressed with minor changes, and doesn't require a re-reading.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@whedon check references
Attempting to check references...
OK DOIs
- 10.1109/ICMLA.2017.0-152 is OK
- 10.1007/BF02607055 is OK
- 10.1073/pnas.1900654116 is OK
- 10.1007/978-1-4612-1694-0_15 is OK
- 10.1214/aos/1176344136 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Thanks everyone!
@pssachdeva At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@terrytangyuan Here is the DOI for the package, which we uploaded on Zenodo: 10.5281/zenodo.3563146.
Here's a link to the Zenodo page.
Edited the above to the "all versions" DOI.
JOSS wants the DOI associated with the specific version that is being discussed in this paper, not the “concept” or “all versions” DOI
@terrytangyuan @danielskatz OK, here's the DOI for the version discussed in the paper: 10.5281/zenodo.3563147
@whedon set 1.0.0 as version
OK. 1.0.0 is the version.
@whedon set 10.5281/zenodo.3563147 as archive
OK. 10.5281/zenodo.3563147 is the archive.
@whedon accept
Attempting dry run of processing paper acceptance...
OK DOIs
- 10.1109/ICMLA.2017.0-152 is OK
- 10.1007/BF02607055 is OK
- 10.1073/pnas.1900654116 is OK
- 10.1007/978-1-4612-1694-0_15 is OK
- 10.1214/aos/1176344136 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1156
If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1156, then you can now move forward with accepting the submission by compiling again with the flag deposit=true
e.g.
@whedon accept deposit=true
@danielskatz The paper looks good to me now. Could you take it from here?
Yes, but in the future, please mention @openjournals/jose-eics in such requests
@pssachdeva - please merge https://github.com/BouchardLab/pyuoi/pull/193 and check to make sure all the cases are correct in the references.
OK, just merged.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
PDF failed to compile for issue #1799 with the following error:
Error reading bibliography ./paper.bib (line 11, column 19): unexpected "2" expecting white space, "#", "," or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@danielskatz The casing in the references looks good now.
Submitting author: @pssachdeva (Pratik Sachdeva) Repository: https://github.com/BouchardLab/pyuoi Version: 1.0.0 Editor: @terrytangyuan Reviewer: @puolival, @yzhao062 Archive: 10.5281/zenodo.3563147
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@puolival & @yzhao062, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @terrytangyuan know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @puolival
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @yzhao062
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper