Problem with data after group_split()

cbratt commented 3 years ago

With V. 1.0 the following code in R makes Mplus have errors in the data (detected because Mplus complained that a categorical variable had too many values).

data_split <- Data            |> 
    remove_all_labels()   |>
group_by(group)               |>
group_split()

Downgrading to 0.7-3 solved the problem.

cbratt commented 3 years ago

It also seems that 1.0 requires (1) variable names to be declared and (2) missing value to be declared, at least for one model I tested.

That seems to pose a problem of backward compatibility: Running the original model (developed with MplusAutomation 0.7-3 or the developmental version of 1.0) did not work in the 1.0 now available; I needed to declare

names = ...  ;

and

missing = . ;

Returning to 0.7-3 resulted in Mplus complaining because of double names. I'm not submitting a separate case on this, as it seems the two apparent problems are related: problems with reading data.

If backward compatibility is at all possible, then it would of course be great. Many prefer reproducible research, and currently, we may have a problem across versions of MplusAutomation.

cbratt commented 3 years ago

Case solved. I believe this particular problem was my tendency to declare an mplusObject() without data, then update() the model with specific data. That worked like a charm in 0.7-3 (and in the developmental version for 1.0), but not any longer in 1.0.

Time for me to update/adapt my workflow. For anyone experiencing the same problem: I believe you can now (Version 1.0) declare the data when first defining an mplusObject and then, if necessary, change the specific data set as part of an update() command:

old_model <- mplusObject("...", data = old_data)
new_model <- update(old_model, data = new_data)

cjvanlissa commented 3 years ago

@cbratt please open this issue again! This needs to be fixed. Can you share some reproducible syntax describing what you're trying to do?

cbratt commented 3 years ago

@cjvanlissa, you can see the original case here: https://github.com/michaelhallquist/MplusAutomation/issues/130, with a reproducible example for 0.8. @JWiley stated that update() wasn't meant to be used this way, but still considered it a bug in 0.8.

I consider it solved now: update() with a new data frame works when the original mplusObject had data. See the first example for update() in MplusAutomation 1.0. The mplusObject has data (mtcars), then updates with new data (iris).

example1 <- mplusObject(MODEL = "mpg ON wt;", 
  usevariables = c("mpg", "hp"), rdata = mtcars)
x <- ~ "ESTIMATOR = ML;"
str(update(example1, rdata = iris))

JWiley commented 3 years ago

@cbratt although not exactly intended, update() should be able to add a dataset even if it was not originally included. With that said, it seems to work for me, even when the original object does not have data (as shown below). Can you test and let me know?

library(MplusAutomation)

example1 <- mplusObject(
  MODEL = "mpg ON wt;", 
  usevariables = c("mpg", "wt"))

example1b <- update(example1, rdata = mtcars)

fit1b <- mplusModeler(example1b,
                      modelout = "exampleb.inp", run=TRUE)

cbratt commented 3 years ago

Thanks, @JWiley. Here is the result - no error detected.

> example1 <- mplusObject(
+   MODEL = "mpg ON wt;", 
+   usevariables = c("mpg", "wt"))
> example1b <- update(example1, rdata = mtcars)
> fit1b <- mplusModeler(example1b,
+                                           modelout = "exampleb.inp", run=TRUE)
> summary(fit1b)
Estimated using ML 
Number of obs: 32, number of (free) parameters: 3 

Model: Chi2(df = 0) = 0, p = 0 
Baseline model: Chi2(df = 1) = 44.726, p = 0 

Fit Indices: 

CFI = 1, TLI = 1, SRMR = 0 
RMSEA = 0, 90% CI [0, 0], p < .05 = 0 
AIC = 166.029, BIC = 170.427

JWiley commented 3 years ago

Hmm okay @cbratt do you know what was going on with your initial report then that there were new errors after using group_split()? I think that a workflow where you create an mplusObject() and then only later add data should work and if it doesn't is an issue I can solve/fix. Do you have a shareable example from your original post showing where it breaks so I can fix?

cbratt commented 3 years ago

Here is what should be a reproducible example of a problem with MplusAutomation and split data:

library(tidyverse)
library(MplusAutomation)

data_split <- mtcars %>%
  group_by(gear)     %>%
  group_split

# The data are now in a list
class(data_split)

# Isolating a single data frame within that list (i.e. within the split data) shows:
data_split[[1]] # The data frame is a tibble.

# Declaring a mplusObject, but not including data
mymodel <- mplusObject(
  VARIABLE = "
    usevariables = mpg cyl;",
  MODEL = "
    mpg ON cyl;
    ",
  rdata = )

# update(), using data_split[[1]] as data
mymodel <- update(mymodel, rdata = data_split[[1]])

# Running the model
mplusModeler(mymodel, "mplusdata.dat", hashfilename = F, 
             modelout = "mymodel.inp", run = 1L)

MplusAutomation prints a warning, and here is what the Mplus output tells:

Mplus VERSION 8.6 (Mac)
MUTHEN & MUTHEN
07/11/2021  11:31 AM

INPUT INSTRUCTIONS

DATA:
  FILE = "mplusdata.dat";

VARIABLE:

  usevariables = mpg cyl;
MODEL:

  mpg ON cyl;

*** ERROR in VARIABLE command
NAMES option is required.  Specify the variables in the data file using
the NAMES option.

cbratt commented 3 years ago

Just to be clear: There is a workaround for this issue: Include preliminary data before updating with a data subset.

    # Declaring a model, AND INCLUDE PRELIMINARY DATA
    mymodel <- mplusObject(
        VARIABLE = "
          usevariables = mpg cyl;",
        MODEL = "
          mpg ON cyl;
          ",
        rdata = mtcars)

    # update(), using data_split[[1]] as data
    mymodel <- update(mymodel, rdata = data_split[[1]])

    # Running the model
    mplusModeler(mymodel, "mplusdata.dat", hashfilename = F, 
                             modelout = "mymodel.inp", run = 1L)

The result:

    When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
    The file(s)
     ‘mplusdata.dat’ 
    currently exist(s) and will be overwritten
    Estimated using ML 
    Number of obs: 15, number of (free) parameters: 3 

    Model: Chi2(df = 0) = 0, p = 0 
    Baseline model: Chi2(df = 1) = 8.069, p = 0.0045 

    Fit Indices: 

    CFI = 1, TLI = 1, SRMR = 0 
    RMSEA = 0, 90% CI [0, 0], p < .05 = 0 
    AIC = 75.926, BIC = 78.05 
    NULL

JWiley commented 3 years ago

Thanks, this is helpful. The issue is that the R side usevariables argument needs to be specified. When data are provided, mplusObject() tries to guess the necessary names from the dataset, but this does not happen without a dataset and when update() adds a dataset, it does not then try to detect the needed variables.

I think I can fix the update() function so that when autov = TRUE (the default) in the mplusObject AND usevariable is NULL and the original mplus object does not have a dataset, but a dataset IS being added in the update, it will also during the update attempt to detect and add the needed variable names. Should be a fairly easy fix, just adding some logical conditions to call the same code as mplusObject() in update() if needed.

Thanks for the reproducible example. Easy, clear, and will let me test if my planned fix works. Work around should not be needed for too much longer, hopefully can fix this week.

cbratt commented 3 years ago

@JWiley, it would also be great if a new model with added variable(s) in MODEL could be declared simply by using update(), without manipulating VARIABLES in the mplusObject. (My experience is that it can be difficult to manipulate 'names' in the VARIABLES by hand, and I believe it's not intended that the user should do that.)

Here's an example.

    > library(MplusAutomation)

    > # Define models --------------------------------------------------------------

    > # Model 1 (with data)
    > model_1 <- mplusObject(
    +   VARIABLE = "
    +         usevariables = mpg cyl;",
    +   MODEL = "
    +         mpg ON cyl;
    +         ",
    +   rdata = mtcars)

    > # Model 2: update() adds another variable to model_1 
    > # (but the code does not modify VARIABLES)
    > model_2 <- update(model_1, 
    +               MODEL = ~.+ "mpg ON am;",
    +               rdata = mtcars)

    > # Running model_1 ----------------------------------------------------------

    > mplusModeler(model_1, "mplusdata.dat", hashfilename = F, 
    +                   modelout = "mymodel.inp", run = 1L)
    When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
    The file(s)
     ‘mplusdata.dat’ 
    currently exist(s) and will be overwritten
    Estimated using ML 
    Number of obs: 32, number of (free) parameters: 3 

    Model: Chi2(df = 0) = 0, p = 0 
    Baseline model: Chi2(df = 1) = 41.449, p = 0 

    Fit Indices: 

    CFI = 1, TLI = 1, SRMR = 0 
    RMSEA = 0, 90% CI [0, 0], p < .05 = 0 
    AIC = 169.306, BIC = 173.704 
    NULL

    > # Running model_2 ----------------------------------------------------------

    > mplusModeler(model_2, "mplusdata.dat", hashfilename = F, 
    +                        modelout = "mymodel.inp", run = 1L)
    When hashfilename = FALSE, writeData cannot be 'ifmissing', setting to 'always'
    The file(s)
     ‘mplusdata.dat’ 
    currently exist(s) and will be overwritten
    Fit Indices: 

    CFI = NA, TLI = NA, SRMR = NA 
    RMSEA = NA, 90% CI [NA, NA], p < .05 = NA 
    AIC = NA, BIC = NA 
    NULL
    Warning message:
    In runModels(target = modelout, Mplus_command = Mplus_command, killOnFail = killOnFail,  :
      Mplus returned error code: 1, for model: mymodel.inp

Mplus reports for model_2:

INPUT INSTRUCTIONS

  DATA:
  FILE = "mplusdata.dat";

  VARIABLE:
  NAMES = mpg cyl;
   MISSING=.;

          usevariables = mpg cyl;
  MODEL:

          mpg ON cyl;

   mpg ON am;

*** ERROR in MODEL command
  Unknown variable(s) in an ON statement:  AM

michaelhallquist / MplusAutomation

Problem with data after group_split() #149