openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: simstudy: Illuminating research methods through data generation #2763

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @assignUser (Jacob Wujciak-Jens) Repository: https://github.com/kgoldfeld/simstudy/ Version: v0.2.2 Editor: @mikldk Reviewer: @gagolews, @brunaw Archive: 10.5281/zenodo.4134675

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424"><img src="https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg)](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@gagolews & @brunaw, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mikldk know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @gagolews

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @brunaw

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @gagolews, @brunaw it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago

PDF failed to compile for issue #2763 with the following error:

Can't find any papers to compile :-(

mikldk commented 4 years ago

@whedon generate pdf from branch joss-submission

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mikldk commented 4 years ago

@whedon check references from branch joss-submission

whedon commented 4 years ago
Attempting to check references... from custom branch joss-submission
whedon commented 4 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v037.i03 is OK
- 10.1016/j.jbusvent.2019.02.001 is OK
- 10.20982/tqmp.16.4.p248 is OK
- 10.1007/s40273-020-00946-y is OK
- 10.1080/00031305.1991.10475828 is OK
- 10.1111/dmcn.14552 is OK
- 10.18637/jss.v069.i04 is OK
- 10.31234/osf.io/59uaq is OK
- 10.1002/sim.8452 is OK
- 10.1097/MLR.0000000000001063 is OK
- 10.1080/03610918.2012.718841 is OK
- 10.1101/215889 is OK
- 10.1007/s10463-020-00761-4 is OK
- 10.1186/s13063-019-3364-x is OK

MISSING DOIs

- None

INVALID DOIs

- None
mikldk commented 4 years ago

@gagolews, @brunaw: Thanks for agreeing to review. Please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If possible create issues (and cross-reference) in the submission's repository to avoid too specific discussions in this review thread.

If you have any questions or concerns please let me know.

brunaw commented 4 years ago

I personally appreciate the creation of this package, since I have had to come up with my own data simulation procedures several times, for many different contexts. I think this package should be accepted because it will help lots of researchers with the same issues. Comments and suggestions about the code & content/documentation of the package are below.

Package content/documentation

Package code

assignUser commented 4 years ago

@brunaw Thank you for your review and your positive recommendation! I will try to address each point you have brought up

brunaw commented 4 years ago

@assignUser Thank you for the quick reply, and I apologize if some of my comments weren't clear.

assignUser commented 4 years ago

@brunaw Latex: Ah I see. As the symbols are rendered correctly on the pkgdown page and any workarounds for github might interfere with that I think we will keep it as is (sadly, I would like it to be rendered too, maybe github will add it at some point). @kgoldfeld fixed it.

Arguments: I feel that with all of the vignettes and targeted documentation we have this covered as a specific example for normal would not cover all use cases, unlike the vignettes/?distributions. Please let me know if this is still a sticking point for you.

Install: I just found the solution for this issue. This is a recently solved problem where devtools/remotes was assuming the default branch to be "master" which was fixed by https://github.com/r-lib/remotes/pull/510 but is not on cran yet: https://remotes.r-lib.org/news/index.html You should be able to install by setting devtools::install_github("kgoldfeld/simstudy", ref = "HEAD")

assignUser commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

PDF failed to compile for issue #2763 with the following error:

Can't find any papers to compile :-(

assignUser commented 4 years ago

@whedon generate pdf from branch joss-submission

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kgoldfeld commented 4 years ago

@brunaw I was able to fix the LaTex issue without affecting the pkgdown site. Thanks for that suggestion - it looks much better - it always bothered me.

As for the grammar of the vignettes - were there any that were particularly egregious? I will certainly go through all of them - as you see there are quite a few - but it would help if you found that particular ones need special attention.

kgoldfeld commented 4 years ago

@brunaw I was able to go through the vignettes - there were some really bad typos throughout. Thanks so much for taking the time to catch all that. I have added seeds as well throughout where there were none. Thanks again.

gagolews commented 4 years ago

The package may be useful to some researchers and students, in cases where generation of data following some typical models is required. Overall, it is quite well written and well documented. The package's API can be considered user friendly. Yet, of course, more exotic scenarios will require its users to implement the missing functionality by hand anyway (and learn how to implement the models included in simstudy anyway). I recommended the paper be accepted provided that the authors address what follows.

Paper — remarks:

  1. Neither in the title nor in the summary the reader is informed that it is an R package

  2. Mention it's availability on CRAN, add link to CRAN entry, and info on how to install the package

  3. The package is documented quite well (vignettes), the link to https://kgoldfeld.github.io/simstudy/articles/simstudy.html should therefore be emphasised in the main text, as in "for more details on the package, use cases, etc. (...), see (...)". Also, add this information to the README file.

  4. $log(\mu)$ → $\log(\mu)$

Vignettes — remarks:

  1. https://kgoldfeld.github.io/simstudy/articles/simstudy.html

    a. $log(\mu)$ → $\log(\mu)$

    b. "please refer to other package vignettes" – you mean other vignettes included in this very package?

    c. "One option is to to use"

    d. "has the following fields: varname, formula, variance, dist, and link" — consider using varname, formula (code)

    e. e.g., in defData(def, varname = "female", dist = "binary", formula = "-2 + age * 0.1", link = "logit") — I guess a more R-way (which is a matter of taste) would be to specify formula = -2+age*0.1, i.e., as an R expression, see ?deparse and ?substitute. The same with varname and dist. See base R functions transform() and subset() for inspiration.

    f. $uniform$ → uniform

    g. $uniforminteger$ → uniforminteger

  2. https://kgoldfeld.github.io/simstudy/articles/correlated.html

    a. side note: copulas (copulae) are nice tools for modelling of dependencies between random variables

gagolews commented 4 years ago

@brunaw re: simulating correlated data - see https://en.wikipedia.org/wiki/Copula_(probability_theory) and https://cran.r-project.org/web/packages/copula/index.html

kgoldfeld commented 4 years ago

@gagolews Thanks for your feedback - will make the editing changes. I agree with you regarding 1e - we may make that change in a future iteration, though there are some "formulas" where there is no standard R formulation. The following (clearly nonsensical) snippet shows three distributions where the standard formulas don't really apply:

library(simstudy)

d <- defData(varname = "x", formula = "0;1", dist = "uniform")
d <- defData(d, varname ="y", formula = "-2+x;-1 + 0.5*x", 
             dist = "categorical", link = "logit")
d <- defData(d, varname = "z", formula = "x|0.5 + y|0.5", dist = "mixture")

set.seed(5)
genData(10, d)
#>     id         x y         z
#>  1:  1 0.2002145 3 0.2002145
#>  2:  2 0.6852186 3 3.0000000
#>  3:  3 0.9168758 3 0.9168758
#>  4:  4 0.2843995 2 0.2843995
#>  5:  5 0.1046501 2 0.1046501
#>  6:  6 0.7010575 1 0.7010575
#>  7:  7 0.5279600 3 0.5279600
#>  8:  8 0.8079352 3 0.8079352
#>  9:  9 0.9565001 3 3.0000000
#> 10: 10 0.1104530 1 1.0000000

With respect to 2a, I agree that copulas are useful in generating correlated data. Indeed, that is what we are doing in simstudy to generated correlated data (for all distributions other than the normal distribution).

gagolews commented 4 years ago

Yeah, but this way you end up inventing a totally new syntax, like a "language within a language", making the API more difficult to learn (and these skills are "not transferable" to other package, as I understand the package aims at less advanced users?)

Actually x|0.5 + y|0.5 is a valid R expression although it means x|(0.5+y)|0.5 and some users might read it exactly like this? How about simply 0.5*x + 0.5*y for mixtures? By the way in lm.formula() we also use the I(...) function to transform the variables

"0;1" → c(0,1) ? "-2+x;-1 + 0.5*x" → c(...) ?

I reckon these are/were all pretty difficult API design questions. If I were you, I'd opt for maximum compatibility with other R functions, especially those from base R, stats, etc. But that's my 99 cents, it's a matter of taste.

kgoldfeld commented 4 years ago

I agree - and it was definitely something I struggled with. There are some things I wish I had done differently, like naming formula and variance rather than something like param1 and param2. We may move to that at some point, and could think about the specification as well.

As for who is using it, I think it is a mix of experienced and less experienced folks. I think the experienced folks appreciate the ease with which you can generate complex study designs without a ton of coding.

I am hesitant to make major changes without some serious thought, because there have been 30K+ downloads, which probably means a few hundred or so serious users. I don't want to mess them up. But I hear what you are saying.

kgoldfeld commented 4 years ago

I made all the editing changes to the vignette - my collaborator will fix the paper issues tomorrow at some point.

assignUser commented 4 years ago

@whedon generate pdf from branch joss-submission

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

assignUser commented 4 years ago

@gagolews Thank you for your review, positive recommendation and valuable insights. I have opened an issue regarding 1.e kgoldfeld/simstudy/issues/75 to keep track of it when we are considering breaking changes.

I have made the changes to the paper and readme.

brunaw commented 4 years ago

@assignUser and @kgoldfeld Thank you for your answers. The package correctly installs from GitHub now, and as for the grammar, I didn't mean anything specific but a few typos here and there that could be fixed (which you already did). In these cases, I always use Grammarly because I'm sure I'll miss something. My other issue was mostly regarding this line of code

def <- defData(varname = "age", dist = "normal", formula = 10, 
               variance = 2)
genData(5, def)

where you're simulating from a N(10, 2) distribution. When I read the documentation I think I missed the part where you say that the formula argument represents the mean of the distribution, so this was actually my mistake, sorry about that! No need to correct anything there.

@gagolews Copulas are the most popular way of simulating from correlated data, but they're also limited because they depend on assumptions about the underlying multivariate distribution and about the marginal distributions. For Binomial cases, for instance, this is not very well defined in the literature as for a Normal distribution (e.g. https://brunaw.com/slides/conferences/EMR2019_poster.pdf). As a result, you have a set of possible methods to simulate from non-Gaussian correlated data, but most times they present issues such as not allowing full flexibility of the covariance matrix. That's why this is not a trivial thing to do deal with, and why I suggested some references are given in the vignettes.

assignUser commented 4 years ago

@brunaw Thanks again :) If you are satisfied with the new "Contributing & Support" section of the readme pointing to CONTRIBUTING.md could you check your last box? :)

brunaw commented 4 years ago

@assignUser Done!

assignUser commented 4 years ago

@whedon commands

whedon commented 4 years ago

Here are some things you can ask me to do:

# List Whedon's capabilities
@whedon commands

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

EDITORIAL TASKS

# Compile the paper
@whedon generate pdf

# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name

# Ask Whedon to check the references for missing DOIs
@whedon check references

# Ask Whedon to check repository statistics for the submitted software
@whedon check repository
assignUser commented 4 years ago

@mikldk Everything should be in order or do we need to do something else to proceed?

kgoldfeld commented 4 years ago

@assignUser @brunaw I just updated the correlation vignette to include the two links that Bruna recommended.

Thanks again for all the comments and suggestions.

mikldk commented 4 years ago

@gagolews, @brunaw: Can you confirm that you have finished the review and recommend that this paper is now published?

@assignUser:

brunaw commented 4 years ago

@mikldk Yes

assignUser commented 4 years ago

@whedon generate pdf from branch joss-submission

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

assignUser commented 4 years ago

@whedon generate pdf from branch joss-submission

whedon commented 4 years ago
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

assignUser commented 4 years ago

@mikldk @kgoldfeld The tagged release is https://github.com/kgoldfeld/simstudy/releases/tag/v0.2.2 archived on Zenodo with the DOI: https://doi.org/10.5281/zenodo.4134675 meta data should be correct. (the paper lives in main now)

assignUser commented 4 years ago

@whedon generate pdf

assignUser commented 4 years ago

@whedon check references

whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 4 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v037.i03 is OK
- 10.1016/j.jbusvent.2019.02.001 is OK
- 10.20982/tqmp.16.4.p248 is OK
- 10.1007/s40273-020-00946-y is OK
- 10.1080/00031305.1991.10475828 is OK
- 10.1111/dmcn.14552 is OK
- 10.18637/jss.v069.i04 is OK
- 10.31234/osf.io/59uaq is OK
- 10.1002/sim.8452 is OK
- 10.1097/MLR.0000000000001063 is OK
- 10.1080/03610918.2012.718841 is OK
- 10.1101/215889 is OK
- 10.1007/s10463-020-00761-4 is OK
- 10.1186/s13063-019-3364-x is OK

MISSING DOIs

- None

INVALID DOIs

- None