tbates / umx

Making Structural Equation Modeling (SEM) in R quick & powerful
https://tbates.github.io/
44 stars 17 forks source link

Discussing implementation design for defn #242

Closed lf-araujo closed 2 months ago

lf-araujo commented 3 months ago

Currently, defn seems to be broken, the equivalent example to the manual:


rm(list=ls())
library(umx)
library(MASS)    # to get hold of mvrnorm function
set.seed(200)    # to make the simulation repeatable
N              <- 500    # sample size, per group
Sigma          <- matrix(c(1,.5,.5,1),2,2)
group1         <- mvrnorm(N, c(1,2), Sigma) # Use mvrnorm from MASS package
group2         <- mvrnorm(N, c(0,0), Sigma)
# Put the two groups together, create a definition variable,
# and make a list of which variables are to be analyzed (selVars)
xy             <- rbind(group1,group2)      # Bind groups together by rows
dimnames(xy)[2]<- list(c("x","y"))          # Add names
def            <- rep(c(1,0),each=N);       # Add def var [2n] for group status
selVars        <- c("x","y")                # Make selection variables object
dataRaw      <- mxData( observed=data.frame(xy,def), type="raw" )

# variances
variances <- umxPath(var = c("x","y"),  labels=c("Varx","Vary") , values = 1)
# covariances
covariances  <- umxPath( with="x", from="y",labels=c("Covxy"), value = 0.1 )
# means
means        <- umxPath(means = c("x","y"), labels=c("meanx","meany") )

# definition value
defValues    <- umxPath( defn = "def" )

defMeansModel <- umxRAM("Definition Means Path Specification",
                         data = data.frame(xy,def), covariances, means,variances,
                          defValues) 

fails with:

> > > > > > > > > + + > A latent variable 'def_def' was created. 
Running Definition Means Path Specification with 5 parameters
Error incurred trying to run model: model = mxTryHard(model) might help?
The job for model 'Definition Means Path Specification' exited abnormally with the error message: fit is not finite (The continuous part of the model implied covariance (loc2) is not positive definite in data 'Definition Means Path Specification.data' row 685. Detail:
covariance =  matrix(c(    # 3x3
   1, 0.1,   0
, 0.1,   1,   0
,   0,   0,   0), byrow=TRUE, nrow=3, ncol=3)
)Warning message:
In model 'Definition Means Path Specification' Optimizer returned a non-zero status code 10. Starting values are not feasible. Consider mxTryHard() 
> [1] "org_babel_R_eoe"

The issue is that the definition variable is being passed to mxData and to manifest vars, when it should only be passed to the former. So the algebras have a column for the definition data "observed" and another for the dummy. I have corrected this and it seems to be working now, but since changes to umxRAM can be potentially catastrophic I am running tests in my own scripts (it passes umx tests). Now,

Happy to assign myself to the task.

tbates commented 3 months ago

Hi @lf-araujo, and thanks as ever for your involvement! As it says at the very bottom of the defn examples for umxPath,

Definition variable example.Not much use at present, = as def vars are not readily used in RAM models... Working on something rational and intuitive.

So, let's :-)

I re-wrote the code above just to how I am used to it for ease of use. Made a working model with no def vars, and used a dataframe so we know what is going on. And called the definition variable "groupID" just to make i clear and not use def in the name.

What is needed now is

library(umx); libs("MASS") # for mvrnorm()

# ==================
# = 1. Create Data =
# ==================

set.seed(200)    # to make the simulation repeatable
N      = 500    # sample size, per group
Sigma  = matrix(c(1,.5,.5,1),2,2)

group1 = mvrnorm(N, c(1,2), Sigma) # Use mvrnorm from MASS package
group2 = mvrnorm(N, c(0,0), Sigma)
groupID = rep(c(1,0), each = N) # Create a definition variable for group status

# rbind the groups and name the columns x and y
xy = rbind(group1,group2)      # Bind groups together by rows
dimnames(xy)[2]= list(c("x","y"))          # Add names
df = data.frame(xy, groupID = groupID)
dfRaw = mxData(df, type="raw")

# ================
# = 2. The Model =
# ================
# list variables to be analyzed (selVars)
selVars = c("x","y")
m1 = umxRAM("regular model", data = df,
    umxPath(var = selVars, labels=c("VarX","VarY") , values = 1), # variances   
    umxPath(from="y", with="x", labels=c("CovXY"), value = 0.1 ), # covariances
    umxPath(means = selVars, labels=c("meanX","meanY") ) # means
) 
plot(m1)

TODO: Add umxAlgebra()
m1 = umxRAM("Definition Means", data = df,
    umxPath(var = selVars, labels = c("VarX","VarY") , values = 1), # variances 
    umxPath(from = "y", with = "x", labels = c("CovXY"), value = 0.1 ), # covariances
    umxPath(means = selVars, labels = c("meanX", "meanY") ) # means
    umxPath(defn = "groupID") # should create a var labelled "data.groupID"
    # here you would need an algebra to compute the mean as `b0*data.groupID`
) 
tbates commented 3 months ago

So the umxPath(defn = myDef, to = c("meanX", "meanY")) sounds like it would be what most umxRAM users would desire. It's messy (hence me not using it enough to improve it), as the user needs to know what those labels would do, but sure: Go ahead and try?

mcneale commented 3 months ago

We should keep in mind that definition variables aren't solely for regressing covariates out of ordinal variables. They can be used to moderate regressions and covariances, and quite a lot of this can be done in RAM models, although doing so requires adding 'phantom' latent variables that have no variance. So fine to make it easier to add covariates to ordinal models, but let's make sure we don't break (or make more difficult to use) other applications.

lf-araujo commented 3 months ago

Agreed, for all intent and purposes it seems the corrected version I have here is doing all that, the optional to= would cover the uses for controlling with covariates only. That would simplify its use for 99% of users.

The phantom variable is looking sane in my tests here. Will report back soon.

tbates commented 2 months ago

So that gets defVars into the manifests when 'umxRAM' is making a dataset.

I guess the next step is to update the umxRAM example to a working ordinal means model using mxAlgebras?

I'm assuming umxPath(defn = "defVarName") already created a latent var with zero mean/variance and label "data.defVarName"? (feels like a 7 year old memory :-) )

The rest of getting a working model would be very direct if the long-standing request for labels for algebras of expected size was implemented...

Then the user just has to give the means a regular label like "meanX", the user just has to say:

umxPath(means = selVars, labels = c("meanX", "meanY") ),
umxMatrix("meanBetas", 2, 2, labels = c("Xb0", "Xb1", "Yb0", "Yb1")) ),
mxAlgebra(meanBetas.Xb0 +  (meanBetas.Xb1 * data.groupID),  label = "meanX"),  
mxAlgebra(meanBetas.Yb0 +  (meanBetas.Yb1 * data.groupID),  label = "meanY")  
lf-araujo commented 2 months ago

Not sure if I am following, sure you can extend the functionality with mxAlgebra. However, if you point the dummy latent to the observed, OpenMx will do the definition variable calculations for the mean (I think?). From the documentation:

dataRaw      <- mxData( observed=data.frame(xy,def), type="raw" )
# variances
variances    <- mxPath( from=c("x","y"), arrows=2,
                        free=TRUE, values=1, labels=c("Varx","Vary") )
# covariances
covariances  <- mxPath( from="x", to="y", arrows=2,
                        free=TRUE, values=.1, labels=c("Covxy") )
# means
means        <- mxPath( from="one", to=c("x","y"), arrows=1,
                        free=TRUE, values=1, labels=c("meanx","meany") )
# definition value
defValues    <- mxPath( from="one", to="DefDummy", arrows=1,
                        free=FALSE, values=1, labels="data.def" )
# beta weights
betaWeights  <- mxPath( from="DefDummy", to=c("x","y"), arrows=1,
                        free=TRUE, values=1, labels=c("beta_1","beta_2") )

defMeansModel <- mxModel("Definition Means Path Specification", type="RAM",
                         manifestVars=selVars, latentVars="DefDummy",
                         dataRaw, variances, covariances, means,
                         defValues, betaWeights)

If I am correct, with current implementation, something like:

  umxPath(defn="ADHD_meds_lifetime"),
  umxPath("def_ADHD_meds_lifetime", to = c("x", "y")),

would work. Maybe this would also work if you point to latent variables to =c("intercept", "linear", "quadratic") too.

mcneale commented 2 months ago

I'm not sure that the majority of users only want definition variables to take covariates out of means of ordinal variables. There are a lot of Purcell modelers out there.

Is it really that hard to put a definition variable on a path in a RAM model? It's just a matter of adding def. to the beginning of the path label.

Personally I would not add specialist syntax tricks to umxPath. I'd put in a new function useCovariates(covList) where covList has a c() of names of the covariates in the dataset. This would be more intuitive to the users who wouldn't have to go down a rabbit hole in the umxPath description. Of course, making it easy to use covariates is of doubtful value since all too often confounders are used, which results in biased estimates. Yes, it's not our fault if we give them an axe with which they amputate their toes, but it's still a bloody mess.

tbates commented 2 months ago

I live and learn @lf-araujo :-) Did not consider that simply connecting a latent with data on its means to a variable would do the trick! (mindset was stuck in the matrix where an algebra would sum the parts of the mean), but the RAM model happily just makes this up from two arrow inputs! nice.

I'll add this example to the Rd:

m1 = umxRAM("Def Means Path Spec", data = df, 
    umxPath(v.m. = c("x","y")),
    umxPath("x", with = "y"),
    umxPath(defn="groupID"), # creates a unit latent called "def_groupID" with data "data.groupID"
    umxPath("def_groupID", to = c("x", "y"))
)
plot(m1)
image