232 functions used in formula= argument in defdata not available for functions defined out of global namespace

kgoldfeld / simstudy

simstudy: Illuminating research methods through data generation

https://kgoldfeld.github.io/simstudy/

GNU General Public License v3.0

80 stars 7 forks source link

232 functions used in formula= argument in defdata not available for functions defined out of global namespace #233

Closed kgoldfeld closed 1 month ago

kgoldfeld commented 2 months ago

@assignUser Everything seems to be good, except there is this very strange issue with one of the vignettes (double_dot_extension.Rmd) that fails to build during R CMD check. The message is kind of odd, because there is no times argument anywhere to be found:

Quitting from lines 157-159 [unnamed-chunk-11] (double_dot_extension.Rmd)
   Error: processing vignette 'double_dot_extension.Rmd' failed with diagnostics:
   invalid 'times' argument

When I knit the rmd file, everything is fine.

And I can run the code inside a function, so clearly the environment issue has been fixed here:

library(simstudy)

environment()
#> <environment: R_GlobalEnv>

test <- function() {

  print(environment())

  mult <- function(x) {
    2*x
  }

  defblk <- defData(varname = "blksize", 
                    formula = "mult(..sizes[1]) | .5 + mult(..sizes[2]) | .5", dist = "mixture")

  sizes <- c(2, 4)
  genData(1000, defblk)  
}

set.seed(123)
test()
#> <environment: 0x13b8f30e8>
#> Key: <id>
#>          id blksize
#>       <int>   <num>
#>    1:     1       4
#>    2:     2       8
#>    3:     3       4
#>    4:     4       8
#>    5:     5       8
#>   ---              
#>  996:   996       8
#>  997:   997       8
#>  998:   998       4
#>  999:   999       8
#> 1000:  1000       4

But, everything I have read indicates that this is possibly an environment issue with the devtools::check() environment. This makes sense since my changes have all been environment-related. But, I can't really understand the specific error message, nor have I been able to see where the environment is properly getting passed.

I'll keep looking, but this is the last remaining piece.

github-actions[bot] commented 2 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 75053fdba56144c5c66930e6f64e7e2d1f3f5db8 is merged into main:

:heavy_check_mark:define_data: 69.9ms -> 70ms [-5.59%, +5.7%]
:heavy_check_mark:dist_beta: 227ms -> 229ms [-1.26%, +2.81%]
:heavy_check_mark:dist_binary: 13.7ms -> 13.7ms [-3.17%, +3.55%]
:exclamation::snail:dist_binomial: 18ms -> 18.1ms [+0.1%, +1.04%]
:heavy_check_mark:dist_categorical: 75.4ms -> 83.1ms [-12.71%, +33.38%]
:heavy_check_mark:dist_exponential: 13.5ms -> 13.7ms [-0.71%, +3.92%]
:exclamation::snail:dist_gamma: 21.9ms -> 22.1ms [+0.28%, +1.46%]
:heavy_check_mark:dist_mixture: 244ms -> 246ms [-1.5%, +3.3%]
:heavy_check_mark:dist_normal: 15.8ms -> 16.4ms [-4.77%, +12.09%]
:heavy_check_mark:gen_all_dists: 73.4ms -> 74.2ms [-0.68%, +2.82%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 2 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f3a06bb0c8a3e1d7135e118474be30848c85b06c is merged into main:

:heavy_check_mark:define_data: 62.7ms -> 63.3ms [-3.43%, +5.16%]
:heavy_check_mark:dist_beta: 228ms -> 229ms [-0.85%, +2.44%]
:heavy_check_mark:dist_binary: 13.8ms -> 14ms [-0.61%, +3.35%]
:heavy_check_mark:dist_binomial: 18.6ms -> 18.4ms [-3.76%, +2.19%]
:heavy_check_mark:dist_categorical: 76.1ms -> 76ms [-1.1%, +0.84%]
:heavy_check_mark:dist_exponential: 13.7ms -> 13.6ms [-1.61%, +0.91%]
:heavy_check_mark:dist_gamma: 22.4ms -> 22.3ms [-1.51%, +0.9%]
:heavy_check_mark:dist_mixture: 245ms -> 244ms [-1.98%, +1.21%]
:heavy_check_mark:dist_normal: 16ms -> 16.2ms [-1.18%, +3.07%]
:heavy_check_mark:gen_all_dists: 71.5ms -> 72.3ms [-0.32%, +2.48%] Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld commented 2 months ago

I think I found the problem - there was some conflict with using the variable name n in .evalWith since we were using list2evn. There must be a a variable with that name in the devtools::check() environment. In any case,things checked OK on my laptop - hoping the tests here show the same thing. If they do, you should feel free to take a look and approve it seems OK. Thank goodness for all the tests.

kgoldfeld commented 1 month ago

Yes - I'll add those tests.

And also, I realize I am not totally comfortable with pulling in the entire environment - I would rather just pull in the functions. This way, if someone happens to have a variable that is the same as one that is used in the function, there is no conflict (as I experienced with the check environment). I am working on that, and will update .evalWith.

github-actions[bot] commented 1 month ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c69acece89fb08eba9e5eb4ecdb70afa3060dfc1 is merged into main:

:heavy_check_mark:define_data: 69.8ms -> 69.8ms [-6.6%, +6.55%]
:heavy_check_mark:dist_beta: 228ms -> 228ms [-2.68%, +2.84%]
:exclamation::snail:dist_binary: 14.1ms -> 14.5ms [+1.08%, +4.84%]
:heavy_check_mark:dist_binomial: 18.7ms -> 18.9ms [-2.15%, +4%]
:heavy_check_mark:dist_categorical: 75.9ms -> 76.6ms [-0.31%, +2.16%]
:heavy_check_mark:dist_exponential: 13.9ms -> 14ms [-1.87%, +2.93%]
:heavy_check_mark:dist_gamma: 22.7ms -> 22.8ms [-0.91%, +1.48%]
:heavy_check_mark:dist_mixture: 247ms -> 246ms [-2.03%, +0.9%]
:heavy_check_mark:dist_normal: 16.5ms -> 16.5ms [-3.12%, +3.55%]
:exclamation::snail:gen_all_dists: 73.4ms -> 76.6ms [+1.28%, +7.52%] Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld commented 1 month ago

@assignUser A few things:

I updated .evalWith to pass only function objects from the parent environment. This seems a little safer. Unless you think this is problematic, I will go ahead and merge.
I received a message from the creator of package gtsummary that I need to "suggest" package broom.helper in order to use the package, which I do in at least one vignette. Indeed, a couple of days later, I received an e-mail from CRAN indicating that simstudy now has errors on CRAN Package Check Results. And those errors are related to calls to gtsummary. Which I guess highlights the downsides of using other packages that are evolving over time. Granted, this is only the second time in like 10 years (or however how old simstudy is) where I needed to make a change as a result of an external package change.
Finally, the error was generated on r-devel-linux-x86_64-fedora-clang. This led me to wonder how we determined which servers to check on the GitHub workflow. Shouldn't we be checking all the servers flavors that CRAN checks?

assignUser commented 1 month ago

Which I guess highlights the downsides of using other packages that are evolving over time.

Yep, it is always good to think about dependencies but you seem to have a good hand for that ;)

This led me to wonder how we determined which servers to check on the GitHub workflow

I looked at the recommendations from the r-lib people: https://github.com/r-lib/actions/blob/v2-branch/examples/check-standard.yaml and modified those a bit.

Testing all flavors is not really worth it for a package without any complex build dependencies like simstudy as it can produce frequent errors that just vanish after a few days (for devel) and are a waste to investigate. In addition there is no real way to accuratly reproduce the cran environments as they don't provide any form of reproducible script and a lot of things are manually tweaked, so even if you try to test on these there is no gurantee that it will accuratley reflect the results you will get on CRAN (yes, this is from painful experience...).

And even if some real issue pops up on CRAN you are not constrained by any release process (compared to arrow for example) that slows you down in deploying a fix.

Looking at the current error that happens on multiple versions and flavors I have no idea, haven't seen that one before. Looks like something broken with knitr? As it wasn't the case when initially submitted I am guessing this is on CRANs end.

kgoldfeld commented 1 month ago

Looking at the current error that happens on multiple versions and flavors I have no idea, haven't seen that one before. Looks like something broken with knitr? As it wasn't the case when initially submitted I am guessing this is on CRANs end.

I am quite sure that the error is related to the changes in package gtsummary - which should go away since I have modified the suggests as they have recommended:

I am preparing the submission of gtsummary v2.0 to CRAN, and I wanted to alert you to a change you may need to incorporate in simstudy.

The gtsummary package utilizes the broom.helpers package to prepare regression model results. In the updated version of the package, the broom.helpers package has moved from Imports to Suggests. As a result, you may also need to add broom.helpers to the Suggests field so the package is available when your vignettes are created.

github-actions[bot] commented 1 month ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if cc418dd35101cec5324e254cc38132fa05e81737 is merged into main:

:heavy_check_mark:define_data: 66.4ms -> 68.2ms [-6.27%, +11.69%]
:heavy_check_mark:dist_beta: 229ms -> 229ms [-2.36%, +1.98%]
:heavy_check_mark:dist_binary: 13.9ms -> 14.7ms [-1.26%, +13.28%]
:heavy_check_mark:dist_binomial: 18.8ms -> 18.9ms [-1.53%, +3.06%]
:heavy_check_mark:dist_categorical: 76.9ms -> 76.3ms [-3.84%, +2.23%]
:heavy_check_mark:dist_exponential: 13.9ms -> 13.8ms [-2.57%, +1.72%]
:exclamation::snail:dist_gamma: 22.1ms -> 22.5ms [+0.27%, +3.94%]
:heavy_check_mark:dist_mixture: 246ms -> 245ms [-2.18%, +1.61%]
:heavy_check_mark:dist_normal: 15.8ms -> 16.1ms [-0.24%, +3.97%]
:heavy_check_mark:gen_all_dists: 72.9ms -> 78.6ms [-4.84%, +20.38%] Further explanation regarding interpretation and methodology can be found in the documentation.