Closed lf-araujo closed 2 months ago
Hi @lf-araujo, and thanks as ever for your involvement! As it says at the very bottom of the defn examples for umxPath,
Definition variable example.Not much use at present, = as def vars are not readily used in RAM models... Working on something rational and intuitive.
So, let's :-)
I re-wrote the code above just to how I am used to it for ease of use. Made a working model with no def vars, and used a dataframe so we know what is going on. And called the definition variable "groupID" just to make i clear and not use def in the name.
What is needed now is
library(umx); libs("MASS") # for mvrnorm()
# ==================
# = 1. Create Data =
# ==================
set.seed(200) # to make the simulation repeatable
N = 500 # sample size, per group
Sigma = matrix(c(1,.5,.5,1),2,2)
group1 = mvrnorm(N, c(1,2), Sigma) # Use mvrnorm from MASS package
group2 = mvrnorm(N, c(0,0), Sigma)
groupID = rep(c(1,0), each = N) # Create a definition variable for group status
# rbind the groups and name the columns x and y
xy = rbind(group1,group2) # Bind groups together by rows
dimnames(xy)[2]= list(c("x","y")) # Add names
df = data.frame(xy, groupID = groupID)
dfRaw = mxData(df, type="raw")
# ================
# = 2. The Model =
# ================
# list variables to be analyzed (selVars)
selVars = c("x","y")
m1 = umxRAM("regular model", data = df,
umxPath(var = selVars, labels=c("VarX","VarY") , values = 1), # variances
umxPath(from="y", with="x", labels=c("CovXY"), value = 0.1 ), # covariances
umxPath(means = selVars, labels=c("meanX","meanY") ) # means
)
plot(m1)
TODO: Add umxAlgebra()
m1 = umxRAM("Definition Means", data = df,
umxPath(var = selVars, labels = c("VarX","VarY") , values = 1), # variances
umxPath(from = "y", with = "x", labels = c("CovXY"), value = 0.1 ), # covariances
umxPath(means = selVars, labels = c("meanX", "meanY") ) # means
umxPath(defn = "groupID") # should create a var labelled "data.groupID"
# here you would need an algebra to compute the mean as `b0*data.groupID`
)
So the umxPath(defn = myDef, to = c("meanX", "meanY"))
sounds like it would be what most umxRAM users would desire.
It's messy (hence me not using it enough to improve it), as the user needs to know what those labels would do, but sure: Go ahead and try?
We should keep in mind that definition variables aren't solely for regressing covariates out of ordinal variables. They can be used to moderate regressions and covariances, and quite a lot of this can be done in RAM models, although doing so requires adding 'phantom' latent variables that have no variance. So fine to make it easier to add covariates to ordinal models, but let's make sure we don't break (or make more difficult to use) other applications.
Agreed, for all intent and purposes it seems the corrected version I have here is doing all that, the optional to=
would cover the uses for controlling with covariates only. That would simplify its use for 99% of users.
The phantom variable is looking sane in my tests here. Will report back soon.
So that gets defVars
into the manifests when 'umxRAM' is making a dataset.
I guess the next step is to update the umxRAM
example to a working ordinal means model using mxAlgebras
?
I'm assuming umxPath(defn = "defVarName")
already created a latent var with zero mean/variance and label "data.defVarName"? (feels like a 7 year old memory :-) )
The rest of getting a working model would be very direct if the long-standing request for labels for algebras of expected size was implemented...
Then the user just has to give the means a regular label like "meanX", the user just has to say:
umxPath(means = selVars, labels = c("meanX", "meanY") ),
umxMatrix("meanBetas", 2, 2, labels = c("Xb0", "Xb1", "Yb0", "Yb1")) ),
mxAlgebra(meanBetas.Xb0 + (meanBetas.Xb1 * data.groupID), label = "meanX"),
mxAlgebra(meanBetas.Yb0 + (meanBetas.Yb1 * data.groupID), label = "meanY")
Not sure if I am following, sure you can extend the functionality with mxAlgebra. However, if you point the dummy latent to the observed, OpenMx will do the definition variable calculations for the mean (I think?). From the documentation:
dataRaw <- mxData( observed=data.frame(xy,def), type="raw" )
# variances
variances <- mxPath( from=c("x","y"), arrows=2,
free=TRUE, values=1, labels=c("Varx","Vary") )
# covariances
covariances <- mxPath( from="x", to="y", arrows=2,
free=TRUE, values=.1, labels=c("Covxy") )
# means
means <- mxPath( from="one", to=c("x","y"), arrows=1,
free=TRUE, values=1, labels=c("meanx","meany") )
# definition value
defValues <- mxPath( from="one", to="DefDummy", arrows=1,
free=FALSE, values=1, labels="data.def" )
# beta weights
betaWeights <- mxPath( from="DefDummy", to=c("x","y"), arrows=1,
free=TRUE, values=1, labels=c("beta_1","beta_2") )
defMeansModel <- mxModel("Definition Means Path Specification", type="RAM",
manifestVars=selVars, latentVars="DefDummy",
dataRaw, variances, covariances, means,
defValues, betaWeights)
If I am correct, with current implementation, something like:
umxPath(defn="ADHD_meds_lifetime"),
umxPath("def_ADHD_meds_lifetime", to = c("x", "y")),
would work. Maybe this would also work if you point to latent variables to =c("intercept", "linear", "quadratic")
too.
I'm not sure that the majority of users only want definition variables to take covariates out of means of ordinal variables. There are a lot of Purcell modelers out there.
Is it really that hard to put a definition variable on a path in a RAM model? It's just a matter of adding def. to the beginning of the path label.
Personally I would not add specialist syntax tricks to umxPath. I'd put in a new function useCovariates(covList) where covList has a c() of names of the covariates in the dataset. This would be more intuitive to the users who wouldn't have to go down a rabbit hole in the umxPath description. Of course, making it easy to use covariates is of doubtful value since all too often confounders are used, which results in biased estimates. Yes, it's not our fault if we give them an axe with which they amputate their toes, but it's still a bloody mess.
I live and learn @lf-araujo :-) Did not consider that simply connecting a latent with data on its means to a variable would do the trick! (mindset was stuck in the matrix where an algebra would sum the parts of the mean), but the RAM model happily just makes this up from two arrow inputs! nice.
I'll add this example to the Rd:
m1 = umxRAM("Def Means Path Spec", data = df,
umxPath(v.m. = c("x","y")),
umxPath("x", with = "y"),
umxPath(defn="groupID"), # creates a unit latent called "def_groupID" with data "data.groupID"
umxPath("def_groupID", to = c("x", "y"))
)
plot(m1)
Currently, defn seems to be broken, the equivalent example to the manual:
fails with:
The issue is that the definition variable is being passed to mxData and to manifest vars, when it should only be passed to the former. So the algebras have a column for the definition data "observed" and another for the dummy. I have corrected this and it seems to be working now, but since changes to umxRAM can be potentially catastrophic I am running tests in my own scripts (it passes umx tests). Now,
mxPath(defn=)
having an optionalto =
argument? This way one can create the betas in a one-liner.Happy to assign myself to the task.