Open cbratt opened 3 years ago
It also seems that 1.0 requires (1) variable names to be declared and (2) missing value to be declared, at least for one model I tested.
That seems to pose a problem of backward compatibility: Running the original model (developed with MplusAutomation 0.7-3 or the developmental version of 1.0) did not work in the 1.0 now available; I needed to declare
names = ... ;
and
missing = . ;
Returning to 0.7-3 resulted in Mplus complaining because of double names. I'm not submitting a separate case on this, as it seems the two apparent problems are related: problems with reading data.
If backward compatibility is at all possible, then it would of course be great. Many prefer reproducible research, and currently, we may have a problem across versions of MplusAutomation.
Case solved. I believe this particular problem was my tendency to declare an mplusObject() without data, then update() the model with specific data. That worked like a charm in 0.7-3 (and in the developmental version for 1.0), but not any longer in 1.0.
Time for me to update/adapt my workflow. For anyone experiencing the same problem: I believe you can now (Version 1.0) declare the data when first defining an mplusObject and then, if necessary, change the specific data set as part of an update() command:
old_model <- mplusObject("...", data = old_data)
new_model <- update(old_model, data = new_data)
@cbratt please open this issue again! This needs to be fixed. Can you share some reproducible syntax describing what you're trying to do?
@cjvanlissa, you can see the original case here: https://github.com/michaelhallquist/MplusAutomation/issues/130, with a reproducible example for 0.8. @JWiley stated that update() wasn't meant to be used this way, but still considered it a bug in 0.8.
I consider it solved now: update() with a new data frame works when the original mplusObject had data. See the first example for update() in MplusAutomation 1.0. The mplusObject has data (mtcars), then updates with new data (iris).
example1 <- mplusObject(MODEL = "mpg ON wt;",
usevariables = c("mpg", "hp"), rdata = mtcars)
x <- ~ "ESTIMATOR = ML;"
str(update(example1, rdata = iris))
@cbratt although not exactly intended, update()
should be able to add a dataset even if it was not originally included.
With that said, it seems to work for me, even when the original object does not have data (as shown below). Can you test and let me know?
library(MplusAutomation)
example1 <- mplusObject(
MODEL = "mpg ON wt;",
usevariables = c("mpg", "wt"))
example1b <- update(example1, rdata = mtcars)
fit1b <- mplusModeler(example1b,
modelout = "exampleb.inp", run=TRUE)
Thanks, @JWiley. Here is the result - no error detected.
> example1 <- mplusObject(
+ MODEL = "mpg ON wt;",
+ usevariables = c("mpg", "wt"))
> example1b <- update(example1, rdata = mtcars)
> fit1b <- mplusModeler(example1b,
+ modelout = "exampleb.inp", run=TRUE)
> summary(fit1b)
Estimated using ML
Number of obs: 32, number of (free) parameters: 3
Model: Chi2(df = 0) = 0, p = 0
Baseline model: Chi2(df = 1) = 44.726, p = 0
Fit Indices:
CFI = 1, TLI = 1, SRMR = 0
RMSEA = 0, 90% CI [0, 0], p < .05 = 0
AIC = 166.029, BIC = 170.427
Hmm okay @cbratt do you know what was going on with your initial report then that there were new errors after using group_split()
? I think that a workflow where you create an mplusObject()
and then only later add data should work and if it doesn't is an issue I can solve/fix.
Do you have a shareable example from your original post showing where it breaks so I can fix?
Here is what should be a reproducible example of a problem with MplusAutomation and split data:
library(tidyverse)
library(MplusAutomation)
data_split <- mtcars %>%
group_by(gear) %>%
group_split
# The data are now in a list
class(data_split)
# Isolating a single data frame within that list (i.e. within the split data) shows:
data_split[[1]] # The data frame is a tibble.
# Declaring a mplusObject, but not including data
mymodel <- mplusObject(
VARIABLE = "
usevariables = mpg cyl;",
MODEL = "
mpg ON cyl;
",
rdata = )
# update(), using data_split[[1]] as data
mymodel <- update(mymodel, rdata = data_split[[1]])
# Running the model
mplusModeler(mymodel, "mplusdata.dat", hashfilename = F,
modelout = "mymodel.inp", run = 1L)
MplusAutomation prints a warning, and here is what the Mplus output tells:
Mplus VERSION 8.6 (Mac)
MUTHEN & MUTHEN
07/11/2021 11:31 AM
INPUT INSTRUCTIONS
DATA:
FILE = "mplusdata.dat";
VARIABLE:
usevariables = mpg cyl;
MODEL:
mpg ON cyl;
*** ERROR in VARIABLE command
NAMES option is required. Specify the variables in the data file using
the NAMES option.
Just to be clear: There is a workaround for this issue: Include preliminary data before updating with a data subset.
# Declaring a model, AND INCLUDE PRELIMINARY DATA
mymodel <- mplusObject(
VARIABLE = "
usevariables = mpg cyl;",
MODEL = "
mpg ON cyl;
",
rdata = mtcars)
# update(), using data_split[[1]] as data
mymodel <- update(mymodel, rdata = data_split[[1]])
# Running the model
mplusModeler(mymodel, "mplusdata.dat", hashfilename = F,
modelout = "mymodel.inp", run = 1L)
The result:
When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
The file(s)
‘mplusdata.dat’
currently exist(s) and will be overwritten
Estimated using ML
Number of obs: 15, number of (free) parameters: 3
Model: Chi2(df = 0) = 0, p = 0
Baseline model: Chi2(df = 1) = 8.069, p = 0.0045
Fit Indices:
CFI = 1, TLI = 1, SRMR = 0
RMSEA = 0, 90% CI [0, 0], p < .05 = 0
AIC = 75.926, BIC = 78.05
NULL
Thanks, this is helpful.
The issue is that the R
side usevariables
argument needs to be specified. When data are provided, mplusObject()
tries to guess the necessary names from the dataset, but this does not happen without a dataset and when update()
adds a dataset, it does not then try to detect the needed variables.
I think I can fix the update()
function so that when autov = TRUE
(the default) in the mplusObject AND usevariable
is NULL and the original mplus object does not have a dataset, but a dataset IS being added in the update, it will also during the update attempt to detect and add the needed variable names. Should be a fairly easy fix, just adding some logical conditions to call the same code as mplusObject()
in update()
if needed.
Thanks for the reproducible example. Easy, clear, and will let me test if my planned fix works. Work around should not be needed for too much longer, hopefully can fix this week.
@JWiley, it would also be great if a new model with added variable(s) in MODEL could be declared simply by using update()
, without manipulating VARIABLES in the mplusObject. (My experience is that it can be difficult to manipulate 'names' in the VARIABLES by hand, and I believe it's not intended that the user should do that.)
Here's an example.
> library(MplusAutomation)
> # Define models --------------------------------------------------------------
> # Model 1 (with data)
> model_1 <- mplusObject(
+ VARIABLE = "
+ usevariables = mpg cyl;",
+ MODEL = "
+ mpg ON cyl;
+ ",
+ rdata = mtcars)
> # Model 2: update() adds another variable to model_1
> # (but the code does not modify VARIABLES)
> model_2 <- update(model_1,
+ MODEL = ~.+ "mpg ON am;",
+ rdata = mtcars)
> # Running model_1 ----------------------------------------------------------
> mplusModeler(model_1, "mplusdata.dat", hashfilename = F,
+ modelout = "mymodel.inp", run = 1L)
When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
The file(s)
‘mplusdata.dat’
currently exist(s) and will be overwritten
Estimated using ML
Number of obs: 32, number of (free) parameters: 3
Model: Chi2(df = 0) = 0, p = 0
Baseline model: Chi2(df = 1) = 41.449, p = 0
Fit Indices:
CFI = 1, TLI = 1, SRMR = 0
RMSEA = 0, 90% CI [0, 0], p < .05 = 0
AIC = 169.306, BIC = 173.704
NULL
> # Running model_2 ----------------------------------------------------------
> mplusModeler(model_2, "mplusdata.dat", hashfilename = F,
+ modelout = "mymodel.inp", run = 1L)
When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
The file(s)
‘mplusdata.dat’
currently exist(s) and will be overwritten
Fit Indices:
CFI = NA, TLI = NA, SRMR = NA
RMSEA = NA, 90% CI [NA, NA], p < .05 = NA
AIC = NA, BIC = NA
NULL
Warning message:
In runModels(target = modelout, Mplus_command = Mplus_command, killOnFail = killOnFail, :
Mplus returned error code: 1, for model: mymodel.inp
Mplus reports for model_2:
INPUT INSTRUCTIONS
DATA:
FILE = "mplusdata.dat";
VARIABLE:
NAMES = mpg cyl;
MISSING=.;
usevariables = mpg cyl;
MODEL:
mpg ON cyl;
mpg ON am;
*** ERROR in MODEL command
Unknown variable(s) in an ON statement: AM
With V. 1.0 the following code in R makes Mplus have errors in the data (detected because Mplus complained that a categorical variable had too many values).
Downgrading to 0.7-3 solved the problem.