216 nonrandom distribution returns a single value when repeated values are expected

kgoldfeld commented 6 months ago

@assignUser I tried to create a test for the new custom functionality, and of course, I need to create a function as part of the test. However, the test doesn't run, because it can't see the function for some reason. Here is my test - any thoughts on how I can make it work?

test_that("custom data is generated as expected.", {
  skip_on_cran()

  trunc_norm <- function(n, lower, upper, mu = 0, s = 1.5) {

    F.a <- pnorm(lower, mean = mu, sd = s)
    F.b <- pnorm(upper, mean = mu, sd = s)

    u <- runif(n, min = F.a, max = F.b)
    qnorm(u, mean = mu, sd = s)

  }

  def <-
    defData(varname = "x", formula = 5, dist = "poisson") |>
    defData(varname = "y", formula = ".trunc_norm",
            variance = "s = 100, lower = x - 1, upper = x + 1",
            dist = "custom"
    )

  dd <- genData(10000, def)

  expect_true( dd[, min(y)] > dd[, min(x-1)])
  expect_true( dd[, max(y)] < dd[, max(x+1)])

})

Here is the error message:

dim = c(10000L, 
1L), dimnames = list(NULL, "x+1")))`: could not find function "trunc_norm"
Backtrace:
    ▆
 1. └─simstudy::genData(10000, def) at test-generate_dist.R:195:3
 2.   └─simstudy:::.generate(...) at simstudy/R/generate_data.R:86:7
 3.     └─simstudy:::.gencustom(...) at simstudy/R/generate_dist.R:16:3
 4.       └─base::do.call(fn, arg_list) at simstudy/R/generate_dist.R:371:3
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 28 ]

Test complete

Of course, the code works if I just run it outside the context of a test.

github-actions[bot] commented 6 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if b6919ccd23a88714dd5a9aa2903283b37d006563 is merged into main:

:heavy_check_mark:define_data: 17.6ms -> 17.4ms [-3.51%, +1.93%]
:heavy_check_mark:dist_beta: 211ms -> 212ms [-0.91%, +1.19%]
:heavy_check_mark:dist_binary: 8.73ms -> 8.71ms [-1.31%, +0.93%]
:heavy_check_mark:dist_binomial: 13ms -> 13ms [-1.44%, +2.46%]
:heavy_check_mark:dist_categorical: 57.7ms -> 58.1ms [-0.18%, +1.45%]
:heavy_check_mark:dist_exponential: 8.84ms -> 8.79ms [-2.01%, +0.78%]
:heavy_check_mark:dist_gamma: 16.3ms -> 16.2ms [-1.5%, +0.09%]
:heavy_check_mark:dist_mixture: 227ms -> 225ms [-1.91%, +0.56%]
:heavy_check_mark:dist_normal: 11.2ms -> 10.6ms [-15.87%, +6.34%]
:heavy_check_mark:gen_all_dists: 50.3ms -> 49.1ms [-6.26%, +1.57%] Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld commented 6 months ago

I figured out the test issue - needed to change the environment when in the do.call(). It has the added benefit of working if the custom function is defined within another function.

I've created a new vignette - that should give you a better idea of how it is all working.

Let me know if you think I can merge this.

github-actions[bot] commented 6 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6c10b214cfcb782166ec814fa17933580fe24181 is merged into main:

:exclamation::snail:define_data: 17.2ms -> 17.6ms [+0.67%, +3.25%]
:heavy_check_mark:dist_beta: 212ms -> 213ms [-0.29%, +1.8%]
:heavy_check_mark:dist_binary: 9.01ms -> 9.18ms [-4.33%, +8.28%]
:heavy_check_mark:dist_binomial: 13.1ms -> 13ms [-2.07%, +1.1%]
:heavy_check_mark:dist_categorical: 58.5ms -> 58.2ms [-1.78%, +0.69%]
:heavy_check_mark:dist_exponential: 8.85ms -> 8.91ms [-1.18%, +2.67%]
:heavy_check_mark:dist_gamma: 16.4ms -> 16.5ms [-1.81%, +3.67%]
:heavy_check_mark:dist_mixture: 226ms -> 226ms [-1.47%, +1.2%]
:heavy_check_mark:dist_normal: 10.6ms -> 11.2ms [-6.62%, +17.84%]
:heavy_check_mark:gen_all_dists: 48.9ms -> 48.8ms [-1.7%, +1.34%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 5 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 4bf87a0bd7a02503b9441961a10e2e1ef8dd720b is merged into main:

:heavy_check_mark:define_data: 17.8ms -> 17.6ms [-4.01%, +1.5%]
:heavy_check_mark:dist_beta: 215ms -> 214ms [-2.25%, +1.53%]
:heavy_check_mark:dist_binary: 8.69ms -> 8.97ms [-3.29%, +9.82%]
:heavy_check_mark:dist_binomial: 13.2ms -> 13.3ms [-3.49%, +4.89%]
:heavy_check_mark:dist_categorical: 58.4ms -> 58.9ms [-2.22%, +3.73%]
:heavy_check_mark:dist_exponential: 9.18ms -> 9.17ms [-1.9%, +1.83%]
:heavy_check_mark:dist_gamma: 16.5ms -> 16.4ms [-2.06%, +0.58%]
:heavy_check_mark:dist_mixture: 226ms -> 225ms [-2.12%, +1.86%]
:heavy_check_mark:dist_normal: 10.6ms -> 10.8ms [-1.39%, +3.67%]
:heavy_check_mark:gen_all_dists: 49.4ms -> 49.5ms [-0.9%, +1.48%] Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld commented 5 months ago

Wow, very cool, nice job! Neat how the systems come together and something is easier than expected 😁

I have just two small question:
* this will at the moment only work with arguments fully specified as `name = vale||var` (just from looking at the code) If I am not mistaken I think that should be mentioned in the documentation to avoid confusion. (or the code changed to accept a mix as well but I think it's totally fine keeping it as is)

Yes - I will make it clear that we need that format "name = value or formula"

* double-dot variables should also work right? that might also be worth a mention.

Yes - double dot works here as well - that was the beauty of the system you set up - I was able to take advantage here.

kgoldfeld commented 5 months ago

I have updated the help documentation in simstudy.Rmd.

One thought I had. I now have the function name in the defData formula field, and the formula string in the variance field, becuase I thought it was logical to have the function name first. However, it might be less confusing to switch them so that the formula string is in the defData formula field, and the function name in the defData variance field. Do you think the change would be useful?

github-actions[bot] commented 5 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f2970243e400a84d072794a45ff483618b7f6524 is merged into main:

:heavy_check_mark:define_data: 17.2ms -> 17.3ms [-0.96%, +2.47%]
:heavy_check_mark:dist_beta: 211ms -> 224ms [-1.31%, +13.24%]
:exclamation::snail:dist_binary: 8.65ms -> 8.81ms [+0.2%, +3.63%]
:heavy_check_mark:dist_binomial: 13ms -> 13ms [-1.91%, +1.55%]
:heavy_check_mark:dist_categorical: 59ms -> 57.8ms [-4.51%, +0.58%]
:heavy_check_mark:dist_exponential: 8.82ms -> 8.89ms [-0.22%, +1.75%]
:heavy_check_mark:dist_gamma: 16.5ms -> 16.5ms [-1.44%, +1.99%]
:heavy_check_mark:dist_mixture: 222ms -> 224ms [-0.66%, +2.39%]
:heavy_check_mark:dist_normal: 10.7ms -> 10.7ms [-1.79%, +1.62%]
:heavy_check_mark:gen_all_dists: 48.7ms -> 48.9ms [-0.02%, +0.88%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 5 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 57a593963159e63a936628491d2353b373b65870 is merged into main:

:heavy_check_mark:define_data: 17.3ms -> 17.4ms [-0.59%, +1.45%]
:heavy_check_mark:dist_beta: 215ms -> 216ms [-1.29%, +2.35%]
:heavy_check_mark:dist_binary: 8.86ms -> 8.98ms [-1.96%, +4.75%]
:heavy_check_mark:dist_binomial: 13.3ms -> 13.2ms [-2.9%, +2.03%]
:heavy_check_mark:dist_categorical: 58.7ms -> 59ms [-0.22%, +1.21%]
:heavy_check_mark:dist_exponential: 9.12ms -> 9.16ms [-2.12%, +3.02%]
:heavy_check_mark:dist_gamma: 16.8ms -> 16.5ms [-2.96%, +0.22%]
:heavy_check_mark:dist_mixture: 231ms -> 230ms [-2.72%, +1.46%]
:heavy_check_mark:dist_normal: 10.8ms -> 10.8ms [-3.17%, +1.83%]
:heavy_check_mark:gen_all_dists: 49.5ms -> 50ms [-0.97%, +2.75%] Further explanation regarding interpretation and methodology can be found in the documentation.

assignUser commented 5 months ago

Do you think the change would be useful?

I think either way works but I would also tend to put the function name first, it just feels 'natural' even if the field name doesn't fit ^^

kgoldfeld commented 5 months ago

I guess I landed in the same place as you. This one is in the books - since it has been merged. I will likely be submitting the current development version as simstudy 0.8.0 in the next few weeks.

kgoldfeld / simstudy

216 nonrandom distribution returns a single value when repeated values are expected #225