openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: univariateML: An R package for maximum likelihood estimation of univariate densities #1863

Closed whedon closed 4 years ago

whedon commented 5 years ago

Submitting author: @JonasMoss (Jonas Moss) Repository: https://github.com/JonasMoss/univariateML Version: v1.0.0 Editor: @arfon Reviewer: @MaaniBeigy, @vbaliga Archive: 10.5281/zenodo.3562385

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f4a10f75f7c09ce88c554dd36b427387"><img src="https://joss.theoj.org/papers/f4a10f75f7c09ce88c554dd36b427387/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f4a10f75f7c09ce88c554dd36b427387/status.svg)](https://joss.theoj.org/papers/f4a10f75f7c09ce88c554dd36b427387)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@MaaniBeigy & @vbaliga, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @MaaniBeigy

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @vbaliga

Conflict of interest

Code of Conduct

General checks

I see there's been some discussion about the License formatting but that @arfon has instructed to keep things as they are. So I will just check this off.

Functionality

Yes, although I note that using either devools::check() or devtools::install_github("JonasMoss/univariateML", build_vignettes = TRUE) fails when attempting to create vignettes. This is likely because of how copula is needed but not installed & loaded by the "Copula Modeling" vignette. I suggest revising how copula is handled within this vignette just so that users who want local vignettes don't have a hard time.

Language such as "quick, easy, and free of bugs" is used in a few places, but I don't see any specifics on how the custom-made optimizers perform. I am familiar enough with many of them that I can attest to their veracity in fitting parameters (and they are well-covered by the package's testthat), but speed or efficiency haven't been documented. I do agree that univariateML provides a lot of convenience. But if language like "quick" is going to be used, it should be backed up with evidence.

Documentation

I'm happy enough with the writing to check off the box here, but please note that other items in this review do relate to this topic.

This is generally covered well, but I'll make a few suggestions to enhance clarity and/or make life easier for people who are encountering the package for the first time.

Please link to your documentation site more directly within the README of the github repository. I realize there's a Documentation header inside the readme, but I'd also link to the main site (https://univariateml.netlify.com/index.html) at the top of the github repo (i.e. above the Topics) as well as within the Overview section.

Within the documentation site, please re-order the Articles dropdown menu so that the vignettes are listed in an order that's friendly to new users (i.e. put the Overview vignette first). This can be done by further editing the _pkgdown.yml in the root directory.

Within the Distributions vignette, I'd make it a little more clear that the Estimator column refers to functions within your univariateML package. I think this can be done via renaming the column and reordering the column names to: Name, univariateML Function, Parameters, Support, Package.

In the Copula vignette, the last line of the first section notes that the package can be used to "speed up task 1" (fitting marginal distributions), but the specifics of how speedy this is (e.g. with run times) or how it compares to approaches via other packages isn't shown. Again, I think that if speed or efficiency are invoked, there should be evidence shown. Or the language could just be revised.

I'm happy with what I see here! Very nicely thought out. I did not catch any errors.

I like the update you made following the other reviewer's comments. Looks good!

Software paper

I'm happy enough with the language to check this off, but I'll also note that the first sentence of the paper could be tweaked to be a bit more descriptive. I like the language used in both the README and the Overview vignette -- it's nice to have a lead statement that gives me a full sense of what the package does in a punchy way. But this is a minor comment and not one that I'm looking to have enforced.

I'd recommend expanding this further. Right now, the paper mentions how this package compares to stats4::mle() but does not really go into enough detail about the specific benefits of using univariateML's functions.

Within the paper, can you give at least one example of a custom-made optimizer within univariateML and show how this optimizer's performance fares against what is built into base R or another package? How have other functions let us down?

I suspect that the ability to fit each distribution is already available through other R packages (and this seems to be somewhat confirmed by the table in the Distributions vignette). If there are any distributions that are not covered by other packages, please make this explicit in the paper by specifying the name of the univariateML estimator function.

I also think there is merit to incorporating more of a "one stop shop" argument in here. It's convenient to have all of these functions within the same package, especially as they feed nicely into the "Data analysis aids" functions, which I'd argue are are the most important parts of the package. So I recommend that the language of the paper be further adjusted to tell the reader why it's good to use this package instead of relying on what is already out there.

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @MaaniBeigy, @vbaliga it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 5 years ago

@MaaniBeigy, @vbaliga - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

Any questions/concerns please let me know.

arfon commented 5 years ago

Adding a note here for myself that @vbaliga won't be able to complete their review until mid-late November.

vbaliga commented 5 years ago

πŸ‘Œ Got it - will do my best to not leave you hanging!

MaaniBeigy commented 4 years ago

Dear @arfon, Greetings,

I reviewed the univariateML package and its software paper. It is an important contribution to the field of statistical programming and has a high quality. My comments are as follow:

  1. Although the author has described an MIT license and provided an MIT LICENSE.md file, the LICENSE plain-text file is not correctly an MIT license. I believe, the author should correct the plain-text LICENSE file according to OSI and Rbuildignore it (i.e., usethis::use_build_ignore("LICENSE") or adding ^LICENSE$ to the .Rbuildignore file).

  2. The Date field in the DESCRIPTION file is missing. It causes the following warning when a user calls citation("univariateML"):

Warning message: In citation("univariateML") : no date field in DESCRIPTION file of package β€˜univariateML’

  1. Although there are clear community guidelines in README, labeled as How to Contribute or Get Help and in the Adding-New-Densities wiki, I recommend using a CONTRIBUTING.md.

Please let me know if there are any questions/concerns, Best regards, Maani

JonasMoss commented 4 years ago

Thanks @MaaniBeigy

  1. The way the license is done right now is similar to e.g. dplyr. I'm not sure I can change it since CRAN has asked me to do it in this way (i.e. two lines on LICENSE and a LICENSE.md) before.

  2. Thanks. I think it should be left as it is though, see https://github.com/r-lib/devtools/issues/1327 for an explanation for the warning (which only happens with devtools::install_github, not install.packages) and https://github.com/MangoTheCat/goodpractice for why it is considered best practice not to have a Date field in the description. I'll make sure to make an inst/CITATION file to this paper once it has a DOI which would hopefully make the issue disappear anyway.

  3. I'll add a CONTRIBUTING.md based on the one in your cvcqv if that's alright. But first I will make the source code tidyvers style compliant.

MaaniBeigy commented 4 years ago

Hello @JonasMoss , Thank you,

  1. Yes, it is true and personally acceptable to me. Just, I don't know whether this format is compatible with the criteria of JOSS, "Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?". The two-line format which you have used (like dplyr) seems not to be an OSI approved license format. I believe @arfon may help us with that.

  2. It is great :raised_hands: . I did not know the date field may also be problematic :anguished: .

  3. That's alright :+1: . I think having a CONTRIBUTING.md is more stylish :sunglasses: , and also helpful for packages which may encourage many to contribute. I believe univariateML will draw the attention of statistical programmers to add more distributions, so a well-written easy guide like CONTRIBUTING.md facilitates such contributions. However, it is only a suggestion.

Good luck, Regards, Maani

arfon commented 4 years ago

@JonasMoss - please move the contents of the License.md file to LICENSE too.

The current LICENSE file format is somewhat confusing and potentially clashes with the License.md contents (so let's just make them the same).

See tidyverse as an example of what to do here.

JonasMoss commented 4 years ago

@arfon @MaaniBeigy I have no problem changing the content of LICENSE for the paper release (https://github.com/JonasMoss/univariateML/commit/9f74ba2c64f1a5e3e3d6e57fdbb40b051d148cb1). But I will have to change it back when submitting the package to CRAN, as the CRAN policy demands LICENSE only contains two lines when the MIT license is specified in DESCRIPTION.

The CRAN policy is here: https://cran.r-project.org/web/licenses/MIT

I guess the point is that when you visit the CRAN site for a package such as dplyr the usual MIT license is already hosted by CRAN and the LICENSE file shouldn't overlap with it.

Their format is different for the GPL (used in e.g. tidyverse) for some reason.

This policy is enforced: 14 days ago CRAN asked me to change the licensing format when I submitted a package with the format you request:

License: MIT + file LICENSE

The MIT license file should only contain the two lines:

YEAR: COPYRIGHT HOLDER:

Please fix and resubmit.

Please understand that I do not want to make a big deal out this and have no stake in it. (If anything I don't understand why CRAN insists the LICENSE file should contain only these two lines and it's definitely confusing.) But there seems to be a conflict between CRAN's policies and JOSS' policies here.

JonasMoss commented 4 years ago

@MaaniBeigy Hi again! I lightly modified your CONTRIBUTING.md in the CONTRIBUTING.md for univariateML. I hope this is okay!

MaaniBeigy commented 4 years ago

Hi @JonasMoss @arfon Great job, Thank you for your explanations.

Your CONTRIBUTING.md sounds great :+1: . You can also add the contents of Adding New Densities there. It may be a bit difficult for the future contributors to find them in the wiki. Again, it is just a suggestion.

Regarding the issue of the plain-text LICENSE file, one may also ask the R-pkg-devel mail group.

Regards, Maani

arfon commented 4 years ago

Please understand that I do not want to make a big deal out this and have no stake in it. (If anything I don't understand why CRAN insists the LICENSE file should contain only these two lines and it's definitely confusing.) But there seems to be a conflict between CRAN's policies and JOSS' policies here.

OK thanks @JonasMoss. This has sort of come up before but I feel like this might have been for a GPL-licensed piece of software and the requirements were (as you note) somewhat different.

Let's just stick with what you have now.

vbaliga commented 4 years ago

Hi @JonasMoss and @arfon,

Thanks for giving me extra time to review this. I'm generally very impressed with univariateML and I can see myself using it. I'd say the vast majority of my comments are geared towards enhancing clarity of presentation, especially for anyone new to the package. Checklist above has been updated, with some in-line feedback.

I have a few more comments that don't exactly fit into the JOSS review checklist, so I will paste them here.

Best regards, Vikram 🐒

Additional comments:

Some relatively minor spelling/grammatical/formatting things:

JonasMoss commented 4 years ago

@MaaniBegi

Your CONTRIBUTING.md sounds great πŸ‘ . You can also add the contents of Adding New Densities there. It may be a bit difficult for the future contributors to find them in the wiki. Again, it is just a suggestion.

Sounds reasonable. I've changed CONTRIBUTING.md accordingly.

JonasMoss commented 4 years ago

Hi @vbaliga and thanks for your comments!

I've responded to your suggestions below, implementing all except the two under the additional comments header. I hope the paper is satisfactory too.

Checklist

Installation:

Yes, although I note that using either devools::check() or devtools::install_github("JonasMoss/univariateML", build_vignettes = TRUE) fails when attempting to create vignettes. This is likely because of how copula is needed but not installed & loaded by the "Copula Modeling" vignette. I suggest revising how copula is handled within this vignette just so that users who want local vignettes don't have a hard time.

Thanks! I had no idea this could be a problem. I copied your solution in https://github.com/JonasMoss/univariateML/commit/0c993bd7293c5e961c0de6b336f615e59ed18ec0

Performance:

Language such as "quick, easy, and free of bugs" is used in a few places, but I don't see any specifics on how the custom-made optimizers perform. I am familiar enough with many of them that I can attest to their veracity in fitting parameters (and they are well-covered by the package's testthat), but speed or efficiency haven't been documented. I do agree that univariateML provides a lot of convenience. But if language like "quick" is going to be used, it should be backed up with evidence.

I added an example with the beta distribution to the readme. Anyway, I've tried to remove all wording about the speed of the implementations, as that is not the focus of this package: The focus of the package is to lessen the time spent coding the implementations, making plots, etc.

(In my experience the usual way to do univariate maximum likelihood is to do an nlm or optim variant such as in the readme, maybe using the mle wrapper of stats4. The analytic solutions will always be faster than this and the custom made solvers in this package can be demonstrated to be faster on a case by case basis.)

Removed the reference to "free of bugs.".

Functionality documentation: Please link to your documentation site more directly within the README of the github repository. I realize there's a Documentation header inside the readme, but I'd also link to the main site (https://univariateml.netlify.com/index.html) at the top of the github repo (i.e. above the Topics) as well as within the Overview section.

Good idea, added link to the Github header and the overview section as requested.

Within the documentation site, please re-order the Articles dropdown menu so that the vignettes are listed in an order that's friendly to new users (i.e. put the Overview vignette first). This can be done by further editing the _pkgdown.yml in the root directory.

Thanks for the tip! Done it and uploaded it to Netlify.

Within the Distributions vignette, I'd make it a little more clear that the Estimator column refers to functions within your univariateML package. I think this can be done via renaming the column and reordering the column names to: Name, univariateML Function, Parameters, Support, Package.

Okay! Done it and uploaded it to Netlify.

In the Copula vignette, the last line of the first section notes that the package can be used to "speed up task 1" (fitting marginal distributions), but the specifics of how speedy this is (e.g. with run times) or how it compares to approaches via other packages isn't shown. Again, I think that if speed or efficiency are invoked, there should be evidence shown. Or the language could just be revised.

Thanks; I just revised the language.

State of the field: I'd recommend expanding this further. Right now, the paper mentions how this package compares to stats4::mle() but does not really go into enough detail about the specific benefits of using univariateML's functions.

Within the paper, can you give at least one example of a custom-made optimizer within univariateML and show how this optimizer's performance fares against what is built into base R or another package? How have other functions let us down?

Thanks! I forgot to check this. Updated the paper accordingly. I've removed claims about speed from the paper.

I suspect that the ability to fit each distribution is already available through other R packages (and this seems to be somewhat confirmed by the table in the Distributions vignette). If there are any distributions that are not covered by other packages, please make this explicit in the paper by specifying the name of the univariateML estimator function.

nlm and optim will work with any d*** function pretty easily; so in a sense all of the estimators are available in the packages implementing the densities themselves.

I also think there is merit to incorporating more of a "one stop shop" argument in here. It's convenient to have all of these functions within the same package, especially as they feed nicely into the "Data analysis aids" functions, which I'd argue are are the most important parts of the package. So I recommend that the language of the paper be further adjusted to tell the reader why it's good to use this package instead of relying on what is already out there.

I agree that's one of main benefits of the package, if not the main purpose. I have tried to make this clearer in the paper.

Additional comments:

  • I suggest inserting a _ within the names of the maximum likelihood estimator functions. I had some trouble distinguishing function names like mllnorm() from mlnorm() as I tested things out. Having the _ could help here, e.g. ml_lnorm() vs ml_norm() is easier for me to distinguish.

I agree it would be clearer! The main reason for using ml*** instead of ml_*** is consistency with the naming convention for densities (d***), distribution functions (p***), quantile functions (q***) and random deviate generation functions (r***). In addition, there are several packages with e*** functions (expectation) and v*** functions (variance).

  • Any interest in creating a delta AIC function? I.e. take the data.frame made by stats::AIC(), find the minimum AIC value and calculate the delta of each model's AIC from that? Not sure if this already exists elsewhere, but I assume not since you custom-wrote a few lines in your vignettes. But it could be a nice addition to the "Data analysis aids" set of functions that basically supersedes stats::AIC() by giving the AICs and also deltas (and maybe AICcs if the user prefers?)

This is probably a good idea but I would like to keep the package as lightweight as I can. Another idea in the same vein is to implement the whole "choose best AIC" model as in the copula vignette, but I feel this is a little beyond the core features of the package. (In a sense the package doesn't "support" AIC either, it just exports objects a logLik generic.)

  • I also have a suggestion that I'll leave entirely optional at this point: You might give some thought to the 'ecosystem' approach of integrating univariateML with your previous kdensity package. Can objects made from one package be used by the other in a convenient way? If so, it might be nice to have a 4th vignette demonstrating this and/or some language in the JOSS paper.

Yep, and well observed! The plan is to integrate kdensity with univariateML once univariateML is on CRAN. kdensity depends on the density functions d*** and estimator functions such as ml***, and will work together with univariateML without any problems! (One can hope...)

Some relatively minor spelling/grammatical/formatting things:

  • Within the Overview vignette, the first line does not resolve into a bulleted list (at least on the rendered Netlify site)
  • Also in the Overview, section "Comparing Many Models with AIC": please revise the line "The generic in R can take multiple models, and the lower the the better" so that it's clearer to your reader that the second half of the sentence refers specifically to AIC values.
  • Also in the Overview vignette, section "Confidence Intervals", sentence 2: Change "Do do" to "To do"
  • In the "Copula Modeling" vignette: "...the copula is know,..." should be "...the copula is known,..."
  • spelling::spell_check_package() yields mostly false positives, but there are a couple items (e.g. "similarily" in ProbabilityPlots.Rd:68) that merit correction

Thanks! These are fixed in several commits.

arfon commented 4 years ago

Thanks for the detailed response @JonasMoss!

@MaaniBeigy & @vbaliga - when you get a moment, could you please review @JonasMoss' response and update your review checklists accordingly?

MaaniBeigy commented 4 years ago

Sure, ASAP.

On Tue, 3 Dec 2019, 20:17 Arfon Smith, notifications@github.com wrote:

Thanks for the detailed response @JonasMoss https://github.com/JonasMoss !

@MaaniBeigy https://github.com/MaaniBeigy & @vbaliga https://github.com/vbaliga - when you get a moment, could you please review @JonasMoss https://github.com/JonasMoss' response and update your review checklists accordingly?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/1863?email_source=notifications&email_token=AI6QPJTLTRZWSZI2KM77LKDQW2ETZA5CNFSM4JIHZKB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF2BEEQ#issuecomment-561254930, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI6QPJVF5XHVH4KEIMB5W3DQW2ETZANCNFSM4JIHZKBQ .

vbaliga commented 4 years ago

Hi @JonasMoss and @arfon,

I took a look through all the changes and Jonas' responses -- very happy with what I see! I just checked off all the items on the review list to reflect that I'm happy with the current state and have no further feedback. I did leave my original review comments in; please let me know if they should be removed.

Congrats, Jonas ✨:tada: -- it's all nicely put together and like I said before, I can see myself using it.

Cheers, Vikram 🐒

MaaniBeigy commented 4 years ago

Hello everybody, The current state is fulfilling. Congratulations, @JonasMoss :+1: Best wishes, Maani

arfon commented 4 years ago

Excellent, thanks for confirming @MaaniBeigy & @vbaliga.

@JonasMoss - At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

I can then move forward with accepting the submission.

JonasMoss commented 4 years ago

@arfon Great!

DOI: 10.5281/zenodo.3562385 The archive is https://zenodo.org/record/3562385

arfon commented 4 years ago

@whedon set 10.5281/zenodo.3562385 as archive

whedon commented 4 years ago

OK. 10.5281/zenodo.3562385 is the archive.

arfon commented 4 years ago

@whedon accept

whedon commented 4 years ago
Attempting dry run of processing paper acceptance...
whedon commented 4 years ago

OK DOIs

- 10.1214/aos/1176324627 is OK
- 10.1111/sjos.12387 is OK
- 10.21105/joss.01566 is OK
- 10.2307/1403464 is OK
- 10.2307/2331493 is OK

MISSING DOIs

- https://doi.org/10.1007/978-1-4612-0919-5_38 may be missing for title: Information theory and an extension of the maximum likelihood principle

INVALID DOIs

- None
whedon commented 4 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1154

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1154, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
arfon commented 4 years ago

@whedon accept deposit=true

whedon commented 4 years ago
Doing it live! Attempting automated processing of paper acceptance...
whedon commented 4 years ago

🐦🐦🐦 πŸ‘‰ Tweet for this paper πŸ‘ˆ 🐦🐦🐦

whedon commented 4 years ago

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/1155
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01863
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! πŸŽ‰πŸŒˆπŸ¦„πŸ’ƒπŸ‘»πŸ€˜

    Any issues? notify your editorial technical team...

arfon commented 4 years ago

@MaaniBeigy, @vbaliga - many thank for your reviews here ✨

@JonasMoss - your paper is now accepted into JOSS :zap::rocket::boom:

whedon commented 4 years ago

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.01863/status.svg)](https://doi.org/10.21105/joss.01863)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01863">
  <img src="https://joss.theoj.org/papers/10.21105/joss.01863/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.01863/status.svg
   :target: https://doi.org/10.21105/joss.01863

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following: