Closed moosterwegel closed 7 months ago
Thanks for reaching out. Yes - you've found a bug that we need to fix related to the double-dot notation. Currently, that variable only gets evaulated if it exists in the "root" environment. So, when you create beta_x_m inside the function, the double dot functionality does not find it - it is looking for it in the calling environment. So, a workaround until we fix this is to define beta_x_m just before calling f2(), like this:
library(simstudy)
set.seed(2024)
# p(x) p(m|x) p(y|x,m)
f2 <- function() {
n <- 2000
prevalence <- 0.1
coefs_y <- c(log(3), log(2))
defs <- defData(
defData(varname = "x", formula = 0, variance = 1, dist = "normal"),
varname = "m",
formula = "x * ..beta_x_m", # only difference with f1
dist = "normal", variance = 1)
intercept <- logisticCoefs(defCovar = defs, coefs = coefs_y, popPrev = prevalence)[['B0']]
defs <- defData(
defs,
varname = "y",
formula = "t(c(..intercept, ..coefs_y)) %*% c(1, x, m)",
dist = "binary", link = "logit")
df <- genData(n, defs)
glm(y ~ 1 + x + m, family = binomial(link = "logit"), data = df) |> coef() |> exp()
}
beta_x_m <- 2
f2()
#> (Intercept) x m
#> 0.01568302 5.22871219 1.60440634
Created on 2024-03-23 with reprex v2.1.0
You can do this with replications. Here is an example with 3 replications where beta_x_m is set to 1.5:
beta_x_m <- 1.5
lapply(1:3, function(x) f2())
#> [[1]]
#> (Intercept) x m
#> 0.02491957 3.83257775 1.63143803
#>
#> [[2]]
#> (Intercept) x m
#> 0.02581665 2.26940339 2.24345931
#>
#> [[3]]
#> (Intercept) x m
#> 0.0171123 3.6076806 1.9789129
Created on 2024-03-23 with reprex v2.1.0
I'll let you know when we've fixed the bigger issue.
Just to follow up - this bug is related specifically to function logisticCoefs
. In general one can pass arguments to functions and refer to them in double dot notation:
library(simstudy)
fa <- function(a) {
def <-
defData(varname = "x", formula = 4, variance = 2) |>
defData(varname = "z", formula = "x*..a", variance = 3)
genData(1000, def)[, mean(z)]
}
set.seed(123123)
fa(5)
#> [1] 19.96077
fa(20)
#> [1] 80.67964
Created on 2024-03-23 with reprex v2.1.0
@assignUser I see the problem - in the logisticCoefs
, the algorithm requires that a dataset using the specified data definition is generated. Of course, the double dot variable has not been passed as an argument in the call to logisticCoefs
, resulting in the error message. The question is, is there a way to pass those variables (in this case beta_x_m) without the user explicitly specifying it. I'm guessing not - so the solution would require the user to attach the reference variables as an argument.
@assignUser I think I see a possible solution: the get
function allows us to pull variables from the parent environment. So, if I can extract all the double dot references from the data definition, I can probably pull those variables into the environment.
Yeah that sounds like it would work, I can look over the issue in a bit too!
I've pretty much got it working - will incorporate into logisticCoefs
soon.
Oh great, feel free to ping me for a review. I had to reset my dev environment (thanks windows...)
This has been fixed - install from development. (No need for workaround provided above.)
That's quick, superb. Thanks!
First of all, thank you for providing an R package that makes it trivial to simulate some data with a certain marginal probability. Much appreciated.
I'm running into an issue with
logisticCoefs
when simulating p(x) p(m|x) p(y|x,m). I want to define the strength of the effect of x on m on the fly, but I can only get it to work outside of a function call. The variable that sets the strength of the effect is not recognized inside the function call. See the reprex below.Any pointers to what's going wrong?
Thank you.