Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @gagolews, @brunaw it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
PDF failed to compile for issue #2763 with the following error:
Can't find any papers to compile :-(
@whedon generate pdf from branch joss-submission
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon check references from branch joss-submission
Attempting to check references... from custom branch joss-submission
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.18637/jss.v037.i03 is OK
- 10.1016/j.jbusvent.2019.02.001 is OK
- 10.20982/tqmp.16.4.p248 is OK
- 10.1007/s40273-020-00946-y is OK
- 10.1080/00031305.1991.10475828 is OK
- 10.1111/dmcn.14552 is OK
- 10.18637/jss.v069.i04 is OK
- 10.31234/osf.io/59uaq is OK
- 10.1002/sim.8452 is OK
- 10.1097/MLR.0000000000001063 is OK
- 10.1080/03610918.2012.718841 is OK
- 10.1101/215889 is OK
- 10.1007/s10463-020-00761-4 is OK
- 10.1186/s13063-019-3364-x is OK
MISSING DOIs
- None
INVALID DOIs
- None
@gagolews, @brunaw: Thanks for agreeing to review. Please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If possible create issues (and cross-reference) in the submission's repository to avoid too specific discussions in this review thread.
If you have any questions or concerns please let me know.
I personally appreciate the creation of this package, since I have had to come up with my own data simulation procedures several times, for many different contexts. I think this package should be accepted because it will help lots of researchers with the same issues. Comments and suggestions about the code & content/documentation of the package are below.
Add latex to the README for clarity
There are a few grammatical errors in the vignettes that could be improved, I suggest using Grammarly for a text review.
I don't think the description for the formula
and variance
arguments is clear yet. For instance, what does it mean to say that formula = 10
? Likewise, what is the variance
argument related to? I think a good way of solving this would be a concrete, written-up example with a Normal distribution, showing that thevariance
argument is the variance of the Normal distribution you're simulating from, and what does the formula
argument represent in that.
It might be useful to use a simulation seed (set.seed()
) throughout the vignettes, so people will know that their code is working as it should be when replicating it. For some vignettes, only running the code leads to quite different results from what is seen in the package website (since it's all random).
I think it would be nice to have your references listed in the vignettes. For example, simulating correlated (multivariate data) is not a trivial thing, since we don't have as much theory available for multivariate distributions beyond the Normal case. I had to go check the reference you cite in the JOSS paper for the binary case, since I hadn't come across that method before. Other users might be interested in knowing exactly what is going on in the code, especially if they need to explain that later (in a paper, for example).
You state the existence of the Contributor Code of Conduct, but actual Contributor guidelines are not provided, or how to report a problem. I think it would be good to write a few lines in the README addressing that.
The package installs locally (with a cloned repository and CRAN) but it wouldn't install from GitHub, the following URL was not found:
https://api.github.com/repos/kgoldfeld/simstudy/tarball/master
The package builds okay after installing the packages used in the vignettes
Tests are provided and also run correctly
I haven't found any other issues with the code itself
@brunaw Thank you for your review and your positive recommendation! I will try to address each point you have brought up
?distributions
and in the overview vignette. Is this in line with what you had in mind?.github/CODE_OF_CONDUCT.md
and .github/CONTRIBUTING.md
respectively so they should be picked up by github for new users, furthermore they are also prominently linked in the pkgdown site. Maybe we could add a note to the README about contributions/error reports @kgoldfeld?master
branch which we renamed to main
. I tested devtools::install_github("kgoldfeld/simstudy")
and that works as intended. I just noticed & fixed that the codemeta (generated with codemetar
) was referring to the the master branch for news as well.@assignUser Thank you for the quick reply, and I apologize if some of my comments weren't clear.
devtools::install_github("kgoldfeld/simstudy")
and get the error. I thought it was an internet connection problem but I tried it on a different network (just now) and the same error happens. The code doesn't suggest anything about it either, so I don't know what causes it. @brunaw
Latex: Ah I see. As the symbols are rendered correctly on the pkgdown page and any workarounds for github might interfere with that I think we will keep it as is (sadly, I would like it to be rendered too, maybe github will add it at some point). @kgoldfeld fixed it.
Arguments: I feel that with all of the vignettes and targeted documentation we have this covered as a specific example for normal
would not cover all use cases, unlike the vignettes/?distributions. Please let me know if this is still a sticking point for you.
Install: I just found the solution for this issue. This is a recently solved problem where devtools/remotes was assuming the default branch to be "master" which was fixed by https://github.com/r-lib/remotes/pull/510 but is not on cran yet: https://remotes.r-lib.org/news/index.html
You should be able to install by setting devtools::install_github("kgoldfeld/simstudy", ref = "HEAD")
@whedon generate pdf
PDF failed to compile for issue #2763 with the following error:
Can't find any papers to compile :-(
@whedon generate pdf from branch joss-submission
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@brunaw I was able to fix the LaTex issue without affecting the pkgdown site. Thanks for that suggestion - it looks much better - it always bothered me.
As for the grammar of the vignettes - were there any that were particularly egregious? I will certainly go through all of them - as you see there are quite a few - but it would help if you found that particular ones need special attention.
@brunaw I was able to go through the vignettes - there were some really bad typos throughout. Thanks so much for taking the time to catch all that. I have added seeds as well throughout where there were none. Thanks again.
The package may be useful to some researchers and students, in cases where generation of data following some typical models
is required. Overall, it is quite well written and well documented. The package's API can be considered user friendly. Yet, of course, more exotic scenarios will require its users to implement the missing functionality by hand anyway (and learn how
to implement the models included in simstudy
anyway). I recommended the paper be accepted provided that the authors address what follows.
Neither in the title nor in the summary the reader is informed that it is an R package
Mention it's availability on CRAN, add link to CRAN entry, and info on how to install the package
The package is documented quite well (vignettes), the link to https://kgoldfeld.github.io/simstudy/articles/simstudy.html should therefore be emphasised in the main text, as in "for more details on the package, use cases, etc. (...), see (...)". Also, add this information to the README file.
$log(\mu)$
→ $\log(\mu)$
https://kgoldfeld.github.io/simstudy/articles/simstudy.html
a. $log(\mu)$
→ $\log(\mu)$
b. "please refer to other package vignettes" – you mean other vignettes included in this very package?
c. "One option is to to use"
d. "has the following fields: varname, formula, variance, dist, and link" — consider using varname
, formula
(code)
e. e.g., in defData(def, varname = "female", dist = "binary", formula = "-2 + age * 0.1", link = "logit")
— I guess a more R-way (which is a matter of taste) would be to specify formula = -2+age*0.1
, i.e., as an R expression, see ?deparse
and ?substitute
. The same with varname
and dist
. See base R functions transform()
and subset()
for inspiration.
f. $uniform$
→ uniform
g. $uniforminteger$
→ uniforminteger
https://kgoldfeld.github.io/simstudy/articles/correlated.html
a. side note: copulas (copulae) are nice tools for modelling of dependencies between random variables
@brunaw re: simulating correlated data - see https://en.wikipedia.org/wiki/Copula_(probability_theory) and https://cran.r-project.org/web/packages/copula/index.html
@gagolews Thanks for your feedback - will make the editing changes. I agree with you regarding 1e - we may make that change in a future iteration, though there are some "formulas" where there is no standard R formulation. The following (clearly nonsensical) snippet shows three distributions where the standard formulas don't really apply:
library(simstudy)
d <- defData(varname = "x", formula = "0;1", dist = "uniform")
d <- defData(d, varname ="y", formula = "-2+x;-1 + 0.5*x",
dist = "categorical", link = "logit")
d <- defData(d, varname = "z", formula = "x|0.5 + y|0.5", dist = "mixture")
set.seed(5)
genData(10, d)
#> id x y z
#> 1: 1 0.2002145 3 0.2002145
#> 2: 2 0.6852186 3 3.0000000
#> 3: 3 0.9168758 3 0.9168758
#> 4: 4 0.2843995 2 0.2843995
#> 5: 5 0.1046501 2 0.1046501
#> 6: 6 0.7010575 1 0.7010575
#> 7: 7 0.5279600 3 0.5279600
#> 8: 8 0.8079352 3 0.8079352
#> 9: 9 0.9565001 3 3.0000000
#> 10: 10 0.1104530 1 1.0000000
With respect to 2a, I agree that copulas are useful in generating correlated data. Indeed, that is what we are doing in simstudy
to generated correlated data (for all distributions other than the normal distribution).
Yeah, but this way you end up inventing a totally new syntax, like a "language within a language", making the API more difficult to learn (and these skills are "not transferable" to other package, as I understand the package aims at less advanced users?)
Actually x|0.5 + y|0.5
is a valid R expression although it means x|(0.5+y)|0.5
and some users might read it exactly like this?
How about simply 0.5*x + 0.5*y
for mixtures? By the way in lm.formula()
we also use the I(...)
function to transform the variables
"0;1"
→ c(0,1)
?
"-2+x;-1 + 0.5*x"
→ c(...)
?
I reckon these are/were all pretty difficult API design questions. If I were you, I'd opt for maximum compatibility with other R functions, especially those from base R, stats
, etc. But that's my 99 cents, it's a matter of taste.
I agree - and it was definitely something I struggled with. There are some things I wish I had done differently, like naming formula
and variance
rather than something like param1
and param2
. We may move to that at some point, and could think about the specification as well.
As for who is using it, I think it is a mix of experienced and less experienced folks. I think the experienced folks appreciate the ease with which you can generate complex study designs without a ton of coding.
I am hesitant to make major changes without some serious thought, because there have been 30K+ downloads, which probably means a few hundred or so serious users. I don't want to mess them up. But I hear what you are saying.
I made all the editing changes to the vignette - my collaborator will fix the paper issues tomorrow at some point.
@whedon generate pdf from branch joss-submission
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@gagolews Thank you for your review, positive recommendation and valuable insights. I have opened an issue regarding 1.e kgoldfeld/simstudy/issues/75 to keep track of it when we are considering breaking changes.
I have made the changes to the paper and readme.
@assignUser and @kgoldfeld Thank you for your answers. The package correctly installs from GitHub now, and as for the grammar, I didn't mean anything specific but a few typos here and there that could be fixed (which you already did). In these cases, I always use Grammarly because I'm sure I'll miss something. My other issue was mostly regarding this line of code
def <- defData(varname = "age", dist = "normal", formula = 10,
variance = 2)
genData(5, def)
where you're simulating from a N(10, 2) distribution. When I read the documentation I think I missed the part where you say that the formula argument represents the mean of the distribution, so this was actually my mistake, sorry about that! No need to correct anything there.
@gagolews Copulas are the most popular way of simulating from correlated data, but they're also limited because they depend on assumptions about the underlying multivariate distribution and about the marginal distributions. For Binomial cases, for instance, this is not very well defined in the literature as for a Normal distribution (e.g. https://brunaw.com/slides/conferences/EMR2019_poster.pdf). As a result, you have a set of possible methods to simulate from non-Gaussian correlated data, but most times they present issues such as not allowing full flexibility of the covariance matrix. That's why this is not a trivial thing to do deal with, and why I suggested some references are given in the vignettes.
@brunaw Thanks again :) If you are satisfied with the new "Contributing & Support" section of the readme pointing to CONTRIBUTING.md could you check your last box? :)
@assignUser Done!
@whedon commands
Here are some things you can ask me to do:
# List Whedon's capabilities
@whedon commands
# List of editor GitHub usernames
@whedon list editors
# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers
EDITORIAL TASKS
# Compile the paper
@whedon generate pdf
# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name
# Ask Whedon to check the references for missing DOIs
@whedon check references
# Ask Whedon to check repository statistics for the submitted software
@whedon check repository
@mikldk Everything should be in order or do we need to do something else to proceed?
@assignUser @brunaw I just updated the correlation vignette to include the two links that Bruna recommended.
Thanks again for all the comments and suggestions.
@gagolews, @brunaw: Can you confirm that you have finished the review and recommend that this paper is now published?
@assignUser:
@whedon generate pdf
@mikldk Yes
@whedon generate pdf from branch joss-submission
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon generate pdf from branch joss-submission
Attempting PDF compilation from custom branch joss-submission. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@mikldk @kgoldfeld The tagged release is https://github.com/kgoldfeld/simstudy/releases/tag/v0.2.2 archived on Zenodo with the DOI: https://doi.org/10.5281/zenodo.4134675
meta data should be correct. (the paper lives in main
now)
@whedon generate pdf
@whedon check references
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.18637/jss.v037.i03 is OK
- 10.1016/j.jbusvent.2019.02.001 is OK
- 10.20982/tqmp.16.4.p248 is OK
- 10.1007/s40273-020-00946-y is OK
- 10.1080/00031305.1991.10475828 is OK
- 10.1111/dmcn.14552 is OK
- 10.18637/jss.v069.i04 is OK
- 10.31234/osf.io/59uaq is OK
- 10.1002/sim.8452 is OK
- 10.1097/MLR.0000000000001063 is OK
- 10.1080/03610918.2012.718841 is OK
- 10.1101/215889 is OK
- 10.1007/s10463-020-00761-4 is OK
- 10.1186/s13063-019-3364-x is OK
MISSING DOIs
- None
INVALID DOIs
- None
Submitting author: @assignUser (Jacob Wujciak-Jens) Repository: https://github.com/kgoldfeld/simstudy/ Version: v0.2.2 Editor: @mikldk Reviewer: @gagolews, @brunaw Archive: 10.5281/zenodo.4134675
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@gagolews & @brunaw, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mikldk know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @gagolews
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @brunaw
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper