lavPredictY not outputting meaningful results for newdata

jebyrnes commented 1 year ago

Hello! I'm delighted that lavPredictY exists! (and maybe a residuals with raw data is soon to come?) I'm trying to futz with it to look at newdata (and see if I can monkey with it enough to get SEs or otherwise. But, I'm having trouble understanding how it works with newdata. Consider the following

library(lavaan)
library(piecewiseSEM)

model <- "
rich ~ firesev
firesev ~ age"

fit <- sem(model, data = keeley)

This works great with just lavPredictY(fit) but once I enter newdata....

 lavPredictY(fit, 
             newdata = data.frame(age = 1),
             ynames = c("firesev", "rich"),
             xnames = "age")

gives

Error in lav_data_full(data = data, group = group, cluster = cluster,  : 
  lavaan ERROR: some (observed) variables specified in the model are not found in the dataset: rich firesev

OK, not what I expected, so I filled in NAs for the two downstream variables.

lavPredictY(fit, 
            newdata = data.frame(age = 1, 
                                firesev = NA,
                                rich = NA),
            ynames = c("firesev", "rich"),
            xnames = "age")

But all I get is an empty data frame

     firesv rich

I thought maybe there needed to be some number there? So...

lavPredictY(fit, 
            newdata = data.frame(age = 1, 
                                firesev = 1,
                                rich = 1),
            ynames = c("firesev", "rich"),
            xnames = "age")

Hrmph. Again, a weird answer - just the newdata input

     firesv rich
[1,]      1    1

Note, I'm using 0.6-13 on CRAN, so, maybe this has been fixed here? Thoughts?

Would love to work with this function to fold it into broom::augment()!

yrosseel commented 1 year ago

As a quick fix, use meanstructure = TRUE when fitting the model (if you plan to use newdata=). Otherwise, the 'observed' mean vector is recomputed based on the newdata, which is not what you want. I will check what happens when NA values are used for the ynames. (Update: this is just the effect of missing = "listwise")

yrosseel commented 1 year ago

Ok, the github version of lavaan now contains two changes: 1) meanstructure = TRUE is always enforced, and 2) missing values are allowed for y-variables. The newdata= data.frame still needs to include the y-variables (for now), although their values are ignored.

jebyrnes commented 1 year ago

Thx - still one more problem - the order of the output columns is reversed?


lavPredictY(fit, 
            newdata = data.frame(age = 1, 
                                 firesev = 1,
                                 rich = 1),
            ynames = c("firesev", "rich"),
            xnames = "age") 

coef(fit)
#firesev
0.060 * 1 + 3.039 

#rich
-3.363 * 3.099 +  64.583

lavPredictY gives

     firesv  rich
[1,] 54.163 3.099

But according to the coefficients

> #firesev
> 0.060 * 1 + 3.039 
[1] 3.099

> #rich
> -3.363 * 3.099 +  64.583  
[1] 54.16106

jebyrnes commented 1 year ago

Note, I also tried reversing the order of where they are in ynames, and same result - flipped.

yrosseel commented 1 year ago

Indeed, thanks! This should now be fixed in the github version.

yrosseel / lavaan

lavPredictY not outputting meaningful results for newdata #259