Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @MaaniBeigy, @vbaliga it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews πΏ
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@MaaniBeigy, @vbaliga - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html
Any questions/concerns please let me know.
Adding a note here for myself that @vbaliga won't be able to complete their review until mid-late November.
π Got it - will do my best to not leave you hanging!
Dear @arfon, Greetings,
I reviewed the univariateML package and its software paper. It is an important contribution to the field of statistical programming and has a high quality. My comments are as follow:
Although the author has described an MIT license and provided an MIT LICENSE.md
file, the LICENSE plain-text file is not correctly an MIT license. I believe, the author should correct the plain-text LICENSE file according to OSI and Rbuildignore it (i.e., usethis::use_build_ignore("LICENSE")
or adding ^LICENSE$
to the .Rbuildignore
file).
The Date
field in the DESCRIPTION
file is missing. It causes the following warning when a user calls citation("univariateML")
:
Warning message: In citation("univariateML") : no date field in DESCRIPTION file of package βunivariateMLβ
Date: 2019-11-06
to the DESCRIPTION
file.README
, labeled as How to Contribute or Get Help and in the Adding-New-Densities wiki, I recommend using a CONTRIBUTING.md
.Please let me know if there are any questions/concerns, Best regards, Maani
Thanks @MaaniBeigy
The way the license is done right now is similar to e.g. dplyr. I'm not sure I can change it since CRAN has asked me to do it in this way (i.e. two lines on LICENSE
and a LICENSE.md
) before.
Thanks. I think it should be left as it is though, see https://github.com/r-lib/devtools/issues/1327 for an explanation for the warning (which only happens with devtools::install_github
, not install.packages
) and https://github.com/MangoTheCat/goodpractice for why it is considered best practice not to have a Date
field in the description. I'll make sure to make an inst/CITATION
file to this paper once it has a DOI which would hopefully make the issue disappear anyway.
I'll add a CONTRIBUTING.md
based on the one in your cvcqv
if that's alright. But first I will make the source code tidyvers style compliant.
Hello @JonasMoss , Thank you,
Yes, it is true and personally acceptable to me. Just, I don't know whether this format is compatible with the criteria of JOSS, "Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?". The two-line format which you have used (like dplyr
) seems not to be an OSI approved license format. I believe @arfon may help us with that.
It is great :raised_hands: . I did not know the date
field may also be problematic :anguished: .
That's alright :+1: . I think having a CONTRIBUTING.md
is more stylish :sunglasses: , and also helpful for packages which may encourage many to contribute. I believe univariateML
will draw the attention of statistical programmers to add more distributions, so a well-written easy guide like CONTRIBUTING.md
facilitates such contributions. However, it is only a suggestion.
Good luck, Regards, Maani
@JonasMoss - please move the contents of the License.md
file to LICENSE
too.
The current LICENSE
file format is somewhat confusing and potentially clashes with the License.md
contents (so let's just make them the same).
See tidyverse as an example of what to do here.
@arfon @MaaniBeigy I have no problem changing the content of LICENSE
for the paper release (https://github.com/JonasMoss/univariateML/commit/9f74ba2c64f1a5e3e3d6e57fdbb40b051d148cb1). But I will have to change it back when submitting the package to CRAN, as the CRAN policy demands LICENSE
only contains two lines when the MIT license is specified in DESCRIPTION.
The CRAN policy is here: https://cran.r-project.org/web/licenses/MIT
I guess the point is that when you visit the CRAN site for a package such as dplyr the usual MIT license is already hosted by CRAN and the LICENSE file shouldn't overlap with it.
Their format is different for the GPL (used in e.g. tidyverse) for some reason.
This policy is enforced: 14 days ago CRAN asked me to change the licensing format when I submitted a package with the format you request:
License: MIT + file LICENSE
The MIT license file should only contain the two lines:
YEAR: COPYRIGHT HOLDER:
Please fix and resubmit.
Please understand that I do not want to make a big deal out this and have no stake in it. (If anything I don't understand why CRAN insists the LICENSE file should contain only these two lines and it's definitely confusing.) But there seems to be a conflict between CRAN's policies and JOSS' policies here.
@MaaniBeigy Hi again! I lightly modified your CONTRIBUTING.md
in the CONTRIBUTING.md
for univariateML
. I hope this is okay!
Hi @JonasMoss @arfon Great job, Thank you for your explanations.
Your CONTRIBUTING.md
sounds great :+1: . You can also add the contents of Adding New Densities there. It may be a bit difficult for the future contributors to find them in the wiki. Again, it is just a suggestion.
Regarding the issue of the plain-text LICENSE file, one may also ask the R-pkg-devel mail group.
Regards, Maani
Please understand that I do not want to make a big deal out this and have no stake in it. (If anything I don't understand why CRAN insists the LICENSE file should contain only these two lines and it's definitely confusing.) But there seems to be a conflict between CRAN's policies and JOSS' policies here.
OK thanks @JonasMoss. This has sort of come up before but I feel like this might have been for a GPL-licensed piece of software and the requirements were (as you note) somewhat different.
Let's just stick with what you have now.
Hi @JonasMoss and @arfon,
Thanks for giving me extra time to review this. I'm generally very impressed with univariateML
and I can see myself using it. I'd say the vast majority of my comments are geared towards enhancing clarity of presentation, especially for anyone new to the package. Checklist above has been updated, with some in-line feedback.
I have a few more comments that don't exactly fit into the JOSS review checklist, so I will paste them here.
Best regards, Vikram π’
_
within the names of the maximum likelihood estimator functions. I had some trouble distinguishing function names like mllnorm()
from mlnorm()
as I tested things out. Having the _
could help here, e.g. ml_lnorm()
vs ml_norm()
is easier for me to distinguish.data.frame
made by stats::AIC()
, find the minimum AIC value and calculate the delta of each model's AIC from that? Not sure if this already exists elsewhere, but I assume not since you custom-wrote a few lines in your vignettes. But it could be a nice addition to the "Data analysis aids" set of functions that basically supersedes stats::AIC()
by giving the AICs and also deltas (and maybe AICcs if the user prefers?)univariateML
with your previous kdensity
package. Can objects made from one package be used by the other in a convenient way? If so, it might be nice to have a 4th vignette demonstrating this and/or some language in the JOSS paper. R
can take multiple models, and the lower the the better" so that it's clearer to your reader that the second half of the sentence refers specifically to AIC values. spelling::spell_check_package()
yields mostly false positives, but there are a couple items (e.g. "similarily" in ProbabilityPlots.Rd:68
) that merit correction@MaaniBegi
Your
CONTRIBUTING.md
sounds great π . You can also add the contents of Adding New Densities there. It may be a bit difficult for the future contributors to find them in the wiki. Again, it is just a suggestion.
Sounds reasonable. I've changed CONTRIBUTING.md
accordingly.
Hi @vbaliga and thanks for your comments!
I've responded to your suggestions below, implementing all except the two under the additional comments header. I hope the paper is satisfactory too.
Installation:
Yes, although I note that using either
devools::check()
ordevtools::install_github("JonasMoss/univariateML", build_vignettes = TRUE)
fails when attempting to create vignettes. This is likely because of howcopula
is needed but not installed & loaded by the "Copula Modeling" vignette. I suggest revising howcopula
is handled within this vignette just so that users who want local vignettes don't have a hard time.
Thanks! I had no idea this could be a problem. I copied your solution in https://github.com/JonasMoss/univariateML/commit/0c993bd7293c5e961c0de6b336f615e59ed18ec0
Performance:
Language such as "quick, easy, and free of bugs" is used in a few places, but I don't see any specifics on how the custom-made optimizers perform. I am familiar enough with many of them that I can attest to their veracity in fitting parameters (and they are well-covered by the package's
testthat
), but speed or efficiency haven't been documented. I do agree thatunivariateML
provides a lot of convenience. But if language like "quick" is going to be used, it should be backed up with evidence.
I added an example with the beta
distribution to the readme. Anyway, I've tried to remove all wording about the speed of the implementations, as that is not the focus of this package: The focus of the package is to lessen the time spent coding the implementations, making plots, etc.
(In my experience the usual way to do univariate maximum likelihood is to do an nlm
or optim
variant such as in the readme, maybe using the mle
wrapper of stats4
. The analytic solutions will always be faster than this and the custom made solvers in this package can be demonstrated to be faster on a case by case basis.)
Removed the reference to "free of bugs.".
Functionality documentation: Please link to your documentation site more directly within the README of the github repository. I realize there's a Documentation header inside the readme, but I'd also link to the main site (https://univariateml.netlify.com/index.html) at the top of the github repo (i.e. above the Topics) as well as within the Overview section.
Good idea, added link to the Github header and the overview section as requested.
Within the documentation site, please re-order the Articles dropdown menu so that the vignettes are listed in an order that's friendly to new users (i.e. put the Overview vignette first). This can be done by further editing the
_pkgdown.yml
in the root directory.
Thanks for the tip! Done it and uploaded it to Netlify.
Within the Distributions vignette, I'd make it a little more clear that the Estimator column refers to functions within your
univariateML
package. I think this can be done via renaming the column and reordering the column names to: Name, univariateML Function, Parameters, Support, Package.
Okay! Done it and uploaded it to Netlify.
In the Copula vignette, the last line of the first section notes that the package can be used to "speed up task 1" (fitting marginal distributions), but the specifics of how speedy this is (e.g. with run times) or how it compares to approaches via other packages isn't shown. Again, I think that if speed or efficiency are invoked, there should be evidence shown. Or the language could just be revised.
Thanks; I just revised the language.
State of the field: I'd recommend expanding this further. Right now, the paper mentions how this package compares to
stats4::mle()
but does not really go into enough detail about the specific benefits of usingunivariateML
's functions.Within the paper, can you give at least one example of a custom-made optimizer within
univariateML
and show how this optimizer's performance fares against what is built into base R or another package? How have other functions let us down?
Thanks! I forgot to check this. Updated the paper accordingly. I've removed claims about speed from the paper.
I suspect that the ability to fit each distribution is already available through other R packages (and this seems to be somewhat confirmed by the table in the Distributions vignette). If there are any distributions that are not covered by other packages, please make this explicit in the paper by specifying the name of the
univariateML
estimator function.
nlm
and optim
will work with any d***
function pretty easily; so in a sense all of the estimators are available in the packages implementing the densities themselves.
I also think there is merit to incorporating more of a "one stop shop" argument in here. It's convenient to have all of these functions within the same package, especially as they feed nicely into the "Data analysis aids" functions, which I'd argue are are the most important parts of the package. So I recommend that the language of the paper be further adjusted to tell the reader why it's good to use this package instead of relying on what is already out there.
I agree that's one of main benefits of the package, if not the main purpose. I have tried to make this clearer in the paper.
- I suggest inserting a
_
within the names of the maximum likelihood estimator functions. I had some trouble distinguishing function names likemllnorm()
frommlnorm()
as I tested things out. Having the_
could help here, e.g.ml_lnorm()
vsml_norm()
is easier for me to distinguish.
I agree it would be clearer! The main reason for using ml***
instead of ml_***
is consistency with the naming convention for densities (d***
), distribution functions (p***
), quantile functions (q***
) and random deviate generation functions (r***
). In addition, there are several packages with e***
functions (expectation) and v***
functions (variance).
- Any interest in creating a delta AIC function? I.e. take the
data.frame
made bystats::AIC()
, find the minimum AIC value and calculate the delta of each model's AIC from that? Not sure if this already exists elsewhere, but I assume not since you custom-wrote a few lines in your vignettes. But it could be a nice addition to the "Data analysis aids" set of functions that basically supersedesstats::AIC()
by giving the AICs and also deltas (and maybe AICcs if the user prefers?)
This is probably a good idea but I would like to keep the package as lightweight as I can. Another idea in the same vein is to implement the whole "choose best AIC" model as in the copula vignette, but I feel this is a little beyond the core features of the package. (In a sense the package doesn't "support" AIC either, it just exports objects a logLik
generic.)
- I also have a suggestion that I'll leave entirely optional at this point: You might give some thought to the 'ecosystem' approach of integrating
univariateML
with your previouskdensity
package. Can objects made from one package be used by the other in a convenient way? If so, it might be nice to have a 4th vignette demonstrating this and/or some language in the JOSS paper.
Yep, and well observed! The plan is to integrate kdensity
with univariateML
once univariateML
is on CRAN. kdensity
depends on the density functions d***
and estimator functions such as ml***
, and will work together with univariateML without any problems! (One can hope...)
Some relatively minor spelling/grammatical/formatting things:
- Within the Overview vignette, the first line does not resolve into a bulleted list (at least on the rendered Netlify site)
- Also in the Overview, section "Comparing Many Models with AIC": please revise the line "The generic in
R
can take multiple models, and the lower the the better" so that it's clearer to your reader that the second half of the sentence refers specifically to AIC values.- Also in the Overview vignette, section "Confidence Intervals", sentence 2: Change "Do do" to "To do"
- In the "Copula Modeling" vignette: "...the copula is know,..." should be "...the copula is known,..."
spelling::spell_check_package()
yields mostly false positives, but there are a couple items (e.g. "similarily" inProbabilityPlots.Rd:68
) that merit correction
Thanks! These are fixed in several commits.
Thanks for the detailed response @JonasMoss!
@MaaniBeigy & @vbaliga - when you get a moment, could you please review @JonasMoss' response and update your review checklists accordingly?
Sure, ASAP.
On Tue, 3 Dec 2019, 20:17 Arfon Smith, notifications@github.com wrote:
Thanks for the detailed response @JonasMoss https://github.com/JonasMoss !
@MaaniBeigy https://github.com/MaaniBeigy & @vbaliga https://github.com/vbaliga - when you get a moment, could you please review @JonasMoss https://github.com/JonasMoss' response and update your review checklists accordingly?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/1863?email_source=notifications&email_token=AI6QPJTLTRZWSZI2KM77LKDQW2ETZA5CNFSM4JIHZKB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF2BEEQ#issuecomment-561254930, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI6QPJVF5XHVH4KEIMB5W3DQW2ETZANCNFSM4JIHZKBQ .
Hi @JonasMoss and @arfon,
I took a look through all the changes and Jonas' responses -- very happy with what I see! I just checked off all the items on the review list to reflect that I'm happy with the current state and have no further feedback. I did leave my original review comments in; please let me know if they should be removed.
Congrats, Jonas β¨:tada: -- it's all nicely put together and like I said before, I can see myself using it.
Cheers, Vikram π’
Hello everybody, The current state is fulfilling. Congratulations, @JonasMoss :+1: Best wishes, Maani
Excellent, thanks for confirming @MaaniBeigy & @vbaliga.
@JonasMoss - At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:
I can then move forward with accepting the submission.
@arfon Great!
DOI: 10.5281/zenodo.3562385 The archive is https://zenodo.org/record/3562385
@whedon set 10.5281/zenodo.3562385 as archive
OK. 10.5281/zenodo.3562385 is the archive.
@whedon accept
Attempting dry run of processing paper acceptance...
OK DOIs
- 10.1214/aos/1176324627 is OK
- 10.1111/sjos.12387 is OK
- 10.21105/joss.01566 is OK
- 10.2307/1403464 is OK
- 10.2307/2331493 is OK
MISSING DOIs
- https://doi.org/10.1007/978-1-4612-0919-5_38 may be missing for title: Information theory and an extension of the maximum likelihood principle
INVALID DOIs
- None
Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1154
If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1154, then you can now move forward with accepting the submission by compiling again with the flag deposit=true
e.g.
@whedon accept deposit=true
@whedon accept deposit=true
Doing it live! Attempting automated processing of paper acceptance...
π¦π¦π¦ π Tweet for this paper π π¦π¦π¦
π¨π¨π¨ THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! π¨π¨π¨
Here's what you must now do:
Party like you just published a paper! πππ¦ππ»π€
Any issues? notify your editorial technical team...
@MaaniBeigy, @vbaliga - many thank for your reviews here β¨
@JonasMoss - your paper is now accepted into JOSS :zap::rocket::boom:
:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:
If you would like to include a link to your paper from your README use the following code snippets:
Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.01863/status.svg)](https://doi.org/10.21105/joss.01863)
HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01863">
<img src="https://joss.theoj.org/papers/10.21105/joss.01863/status.svg" alt="DOI badge" >
</a>
reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.01863/status.svg
:target: https://doi.org/10.21105/joss.01863
This is how it will look in your documentation:
We need your help!
Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
Submitting author: @JonasMoss (Jonas Moss) Repository: https://github.com/JonasMoss/univariateML Version: v1.0.0 Editor: @arfon Reviewer: @MaaniBeigy, @vbaliga Archive: 10.5281/zenodo.3562385
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@MaaniBeigy & @vbaliga, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.
β¨ Please try and complete your review in the next two weeks β¨
Review checklist for @MaaniBeigy
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @vbaliga
Conflict of interest
Code of Conduct
General checks
I see there's been some discussion about the License formatting but that @arfon has instructed to keep things as they are. So I will just check this off.
Functionality
Yes, although I note that using either
devools::check()
ordevtools::install_github("JonasMoss/univariateML", build_vignettes = TRUE)
fails when attempting to create vignettes. This is likely because of howcopula
is needed but not installed & loaded by the "Copula Modeling" vignette. I suggest revising howcopula
is handled within this vignette just so that users who want local vignettes don't have a hard time.Language such as "quick, easy, and free of bugs" is used in a few places, but I don't see any specifics on how the custom-made optimizers perform. I am familiar enough with many of them that I can attest to their veracity in fitting parameters (and they are well-covered by the package's
testthat
), but speed or efficiency haven't been documented. I do agree thatunivariateML
provides a lot of convenience. But if language like "quick" is going to be used, it should be backed up with evidence.Documentation
I'm happy enough with the writing to check off the box here, but please note that other items in this review do relate to this topic.
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
This is generally covered well, but I'll make a few suggestions to enhance clarity and/or make life easier for people who are encountering the package for the first time.
Please link to your documentation site more directly within the README of the github repository. I realize there's a Documentation header inside the readme, but I'd also link to the main site (https://univariateml.netlify.com/index.html) at the top of the github repo (i.e. above the Topics) as well as within the Overview section.
Within the documentation site, please re-order the Articles dropdown menu so that the vignettes are listed in an order that's friendly to new users (i.e. put the Overview vignette first). This can be done by further editing the
_pkgdown.yml
in the root directory.Within the Distributions vignette, I'd make it a little more clear that the Estimator column refers to functions within your
univariateML
package. I think this can be done via renaming the column and reordering the column names to: Name, univariateML Function, Parameters, Support, Package.In the Copula vignette, the last line of the first section notes that the package can be used to "speed up task 1" (fitting marginal distributions), but the specifics of how speedy this is (e.g. with run times) or how it compares to approaches via other packages isn't shown. Again, I think that if speed or efficiency are invoked, there should be evidence shown. Or the language could just be revised.
I'm happy with what I see here! Very nicely thought out. I did not catch any errors.
I like the update you made following the other reviewer's comments. Looks good!
Software paper
I'm happy enough with the language to check this off, but I'll also note that the first sentence of the paper could be tweaked to be a bit more descriptive. I like the language used in both the README and the Overview vignette -- it's nice to have a lead statement that gives me a full sense of what the package does in a punchy way. But this is a minor comment and not one that I'm looking to have enforced.
I'd recommend expanding this further. Right now, the paper mentions how this package compares to
stats4::mle()
but does not really go into enough detail about the specific benefits of usingunivariateML
's functions.Within the paper, can you give at least one example of a custom-made optimizer within
univariateML
and show how this optimizer's performance fares against what is built into base R or another package? How have other functions let us down?I suspect that the ability to fit each distribution is already available through other R packages (and this seems to be somewhat confirmed by the table in the Distributions vignette). If there are any distributions that are not covered by other packages, please make this explicit in the paper by specifying the name of the
univariateML
estimator function.I also think there is merit to incorporating more of a "one stop shop" argument in here. It's convenient to have all of these functions within the same package, especially as they feed nicely into the "Data analysis aids" functions, which I'd argue are are the most important parts of the package. So I recommend that the language of the paper be further adjusted to tell the reader why it's good to use this package instead of relying on what is already out there.