mlcollyer / RRPP

RRPP: An R package for fitting linear models to high-dimensional data using residual randomization
https://mlcollyer.github.io/RRPP/
4 stars 1 forks source link

Issues with predict() in mixed effects model #7

Closed jmlayton closed 1 year ago

jmlayton commented 1 year ago

I am using a full rank dataframe that looks like: image

All predictors are of class "factor". The model is:

test <- RRPP::lm.rrpp(Simpson ~ DSF_groups + date_col_grps + parity_groups + gender + Litter, data=SAM_NA.rm, SS.type="III", print.progress = FALSE, iter=1000)

which when run prints: image

I want to obtain the LSMs of the litter factor in my model. So, to build my "newdata" dataframe I run:

Litdf <- data.frame(Litter= as.character(sort(unique(SAM_NA.rm$Litter))))

rownames(Litdf) <- Litdf$Litter

Which makes a data.frame that looks like: image

My predict() function is coded as: predict(test, Litdf)

And return the following error: image

It's also interesting the the same script runs with a mixed model that is just Simpson ~ gender + Litter with no other changes.

Is there anything obvious I have missed in my code or model?

mlcollyer commented 1 year ago

This problem is caused by the resulting, and sometimes illogical, model matrix column names after QR decomposition. You lost six columns because of redundant columns in the model matrix and while that is not a problem, if the resulting column names omit one of the column names from a new model matrix based on Litter factor levels, the attempt to match parameters fails. This is not something you can control. It is an operational issue that I have to fix. This issue seems to have emerged recently, so I believe it has something to do with R’s method of labeling columns in their qr function.

I will try to find a new solution but it might take some work. Please be patient.

Mike

On Feb 16, 2023, at 5:51 PM, jmlayton @.***> wrote:

I am using a full rank dataframe that looks like: https://user-images.githubusercontent.com/91229141/219503467-0279daab-335b-424a-9e5e-851775253698.png All predictors are of class "factor". The model is:

test <- RRPP::lm.rrpp(Simpson ~ DSF_groups + date_col_grps + parity_groups + gender + Litter, data=SAM_NA.rm, SS.type="III", print.progress = FALSE, iter=1000)

which when run prints: https://user-images.githubusercontent.com/91229141/219503752-74fb9ff8-1bb8-43e5-8ce5-beadd2e0aa6b.png I want to obtain the LSMs of the litter factor in my model. So, to build my "newdata" dataframe I run:

Litdf <- data.frame(Litter= as.character(sort(unique(SAM_NA.rm$Litter))))

rownames(Litdf) <- Litdf$Litter Which makes a data.frame that looks like: https://user-images.githubusercontent.com/91229141/219504076-f3e0be2e-899d-46ea-a380-66982fcd3761.png My predict() function is coded as: predict(test, Litdf)

And return the following error: https://user-images.githubusercontent.com/91229141/219504435-4212d335-8c90-436c-b9be-89405ff1f450.png It's also interesting the the same script runs with a mixed model that is just Simpson ~ gender + Litter with no other changes.

Is there anything obvious I have missed in my code or model?

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4WAV2KVRCWTC53Z6XTWX2VNXANCNFSM6AAAAAAU6YCRX4. You are receiving this because you are subscribed to this thread.

mlcollyer commented 1 year ago

I attempted to fix the issue but I cannot guarantee it provides logical results. You can install it from Github. The way LS means are calculated it to find the mean of every parameter in the linear model design matrix, X, and obtain new parameters only for the data frame specified as newdata in predict.lm.rrpp. The function replaces the parameters based on name. This works, generally, unless parameters are lost from QR decomposition.

Here is an example for how this is arbitrary, with an example I used to replicate the error:

lm.rrpp(y~subject + group/subject , data = df)$LM$X Warning: This is not an error! It is a friendly warning.

Because variables in the linear model are redundant, the linear model design has been truncated (via QR decomposition). Original X columns: 12 Final X columns (rank): 6 Check coefficients or degrees of freedom in ANOVA to see changes.

Use options(warn = -1) to turn off these warnings.

(Intercept) subject2 subject3 subject4 subject5 subject6 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 0 0 5 1 0 0 0 0 0 6 1 0 0 0 0 0 7 1 0 0 0 0 0 8 1 0 0 0 0 0 9 1 0 0 0 0 0 10 1 0 0 0 0 0 11 1 1 0 0 0 0 12 1 1 0 0 0 0 13 1 1 0 0 0 0 14 1 1 0 0 0 0 15 1 1 0 0 0 0 16 1 1 0 0 0 0 17 1 1 0 0 0 0 18 1 1 0 0 0 0 19 1 1 0 0 0 0 20 1 1 0 0 0 0 21 1 0 1 0 0 0 22 1 0 1 0 0 0 23 1 0 1 0 0 0 24 1 0 1 0 0 0 25 1 0 1 0 0 0 26 1 0 1 0 0 0 27 1 0 1 0 0 0 28 1 0 1 0 0 0 29 1 0 1 0 0 0 30 1 0 1 0 0 0 31 1 0 0 1 0 0 32 1 0 0 1 0 0 33 1 0 0 1 0 0 34 1 0 0 1 0 0 35 1 0 0 1 0 0 36 1 0 0 1 0 0 37 1 0 0 1 0 0 38 1 0 0 1 0 0 39 1 0 0 1 0 0 40 1 0 0 1 0 0 41 1 0 0 0 1 0 42 1 0 0 0 1 0 43 1 0 0 0 1 0 44 1 0 0 0 1 0 45 1 0 0 0 1 0 46 1 0 0 0 1 0 47 1 0 0 0 1 0 48 1 0 0 0 1 0 49 1 0 0 0 1 0 50 1 0 0 0 1 0 51 1 0 0 0 0 1 52 1 0 0 0 0 1 53 1 0 0 0 0 1 54 1 0 0 0 0 1 55 1 0 0 0 0 1 56 1 0 0 0 0 1 57 1 0 0 0 0 1 58 1 0 0 0 0 1 59 1 0 0 0 0 1 60 1 0 0 0 0 1

lm.rrpp(y~group/subject + subject, data = df)$LM$X Warning: This is not an error! It is a friendly warning.

Because variables in the linear model are redundant, the linear model design has been truncated (via QR decomposition). Original X columns: 12 Final X columns (rank): 6 Check coefficients or degrees of freedom in ANOVA to see changes.

Use options(warn = -1) to turn off these warnings.

(Intercept) group2 subject2 subject3 subject4 subject5 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 0 0 5 1 0 0 0 0 0 6 1 0 0 0 0 0 7 1 0 0 0 0 0 8 1 0 0 0 0 0 9 1 0 0 0 0 0 10 1 0 0 0 0 0 11 1 0 1 0 0 0 12 1 0 1 0 0 0 13 1 0 1 0 0 0 14 1 0 1 0 0 0 15 1 0 1 0 0 0 16 1 0 1 0 0 0 17 1 0 1 0 0 0 18 1 0 1 0 0 0 19 1 0 1 0 0 0 20 1 0 1 0 0 0 21 1 0 0 1 0 0 22 1 0 0 1 0 0 23 1 0 0 1 0 0 24 1 0 0 1 0 0 25 1 0 0 1 0 0 26 1 0 0 1 0 0 27 1 0 0 1 0 0 28 1 0 0 1 0 0 29 1 0 0 1 0 0 30 1 0 0 1 0 0 31 1 1 0 0 1 0 32 1 1 0 0 1 0 33 1 1 0 0 1 0 34 1 1 0 0 1 0 35 1 1 0 0 1 0 36 1 1 0 0 1 0 37 1 1 0 0 1 0 38 1 1 0 0 1 0 39 1 1 0 0 1 0 40 1 1 0 0 1 0 41 1 1 0 0 0 1 42 1 1 0 0 0 1 43 1 1 0 0 0 1 44 1 1 0 0 0 1 45 1 1 0 0 0 1 46 1 1 0 0 0 1 47 1 1 0 0 0 1 48 1 1 0 0 0 1 49 1 1 0 0 0 1 50 1 1 0 0 0 1 51 1 1 0 0 0 0 52 1 1 0 0 0 0 53 1 1 0 0 0 0 54 1 1 0 0 0 0 55 1 1 0 0 0 0 56 1 1 0 0 0 0 57 1 1 0 0 0 0 58 1 1 0 0 0 0 59 1 1 0 0 0 0 60 1 1 0 0 0 0

These matrices look different but estimate the same thing. The factor “group” is used but the nesting of subjects does not make sense, as they are unique to groups. The order of factors made a difference. In the first case, group was removed and in the second case, subject 6 is removed, due to redundancies. For ANOVA, it does not matter. For prediction, it does.

Please download the updated version of RRPP from github and let me know if you still have an error. If you have no error but have strange results, it will be hard to circumvent with removed parameters. Maybe manipulating the variable order can be used to trick R, but that is about the best one can do.

Good luck!

Mike

On Feb 16, 2023, at 9:47 PM, Mike Collyer @.***> wrote:

This problem is caused by the resulting, and sometimes illogical, model matrix column names after QR decomposition. You lost six columns because of redundant columns in the model matrix and while that is not a problem, if the resulting column names omit one of the column names from a new model matrix based on Litter factor levels, the attempt to match parameters fails. This is not something you can control. It is an operational issue that I have to fix. This issue seems to have emerged recently, so I believe it has something to do with R’s method of labeling columns in their qr function.

I will try to find a new solution but it might take some work. Please be patient.

Mike

On Feb 16, 2023, at 5:51 PM, jmlayton @.***> wrote:

I am using a full rank dataframe that looks like: https://user-images.githubusercontent.com/91229141/219503467-0279daab-335b-424a-9e5e-851775253698.png All predictors are of class "factor". The model is:

test <- RRPP::lm.rrpp(Simpson ~ DSF_groups + date_col_grps + parity_groups + gender + Litter, data=SAM_NA.rm, SS.type="III", print.progress = FALSE, iter=1000)

which when run prints: https://user-images.githubusercontent.com/91229141/219503752-74fb9ff8-1bb8-43e5-8ce5-beadd2e0aa6b.png I want to obtain the LSMs of the litter factor in my model. So, to build my "newdata" dataframe I run:

Litdf <- data.frame(Litter= as.character(sort(unique(SAM_NA.rm$Litter))))

rownames(Litdf) <- Litdf$Litter Which makes a data.frame that looks like: https://user-images.githubusercontent.com/91229141/219504076-f3e0be2e-899d-46ea-a380-66982fcd3761.png My predict() function is coded as: predict(test, Litdf)

And return the following error: https://user-images.githubusercontent.com/91229141/219504435-4212d335-8c90-436c-b9be-89405ff1f450.png It's also interesting the the same script runs with a mixed model that is just Simpson ~ gender + Litter with no other changes.

Is there anything obvious I have missed in my code or model?

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4WAV2KVRCWTC53Z6XTWX2VNXANCNFSM6AAAAAAU6YCRX4. You are receiving this because you are subscribed to this thread.

mlcollyer commented 1 year ago

Additionally, I did not catch this the first time, but in your formula,

Simpson ~ DSF_groups + date_col_grps + parity_groups + gender + Litter

if you change the order to

Simpson ~ Litter + DSF_groups + date_col_grps + parity_groups + gender

and with using type III SS, your ANOVA results would be the same but you might be able to trick R into dropping redundant parameters from one of the other variables than Litter (the names of parameters for Litter would all be in tact, since this variable is the first one introdced).

Cheers! Mike

On Feb 16, 2023, at 11:08 PM, Mike Collyer @.***> wrote:

I attempted to fix the issue but I cannot guarantee it provides logical results. You can install it from Github. The way LS means are calculated it to find the mean of every parameter in the linear model design matrix, X, and obtain new parameters only for the data frame specified as newdata in predict.lm.rrpp. The function replaces the parameters based on name. This works, generally, unless parameters are lost from QR decomposition.

Here is an example for how this is arbitrary, with an example I used to replicate the error:

lm.rrpp(y~subject + group/subject , data = df)$LM$X Warning: This is not an error! It is a friendly warning.

Because variables in the linear model are redundant, the linear model design has been truncated (via QR decomposition). Original X columns: 12 Final X columns (rank): 6 Check coefficients or degrees of freedom in ANOVA to see changes.

Use options(warn = -1) to turn off these warnings.

(Intercept) subject2 subject3 subject4 subject5 subject6 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 0 0 5 1 0 0 0 0 0 6 1 0 0 0 0 0 7 1 0 0 0 0 0 8 1 0 0 0 0 0 9 1 0 0 0 0 0 10 1 0 0 0 0 0 11 1 1 0 0 0 0 12 1 1 0 0 0 0 13 1 1 0 0 0 0 14 1 1 0 0 0 0 15 1 1 0 0 0 0 16 1 1 0 0 0 0 17 1 1 0 0 0 0 18 1 1 0 0 0 0 19 1 1 0 0 0 0 20 1 1 0 0 0 0 21 1 0 1 0 0 0 22 1 0 1 0 0 0 23 1 0 1 0 0 0 24 1 0 1 0 0 0 25 1 0 1 0 0 0 26 1 0 1 0 0 0 27 1 0 1 0 0 0 28 1 0 1 0 0 0 29 1 0 1 0 0 0 30 1 0 1 0 0 0 31 1 0 0 1 0 0 32 1 0 0 1 0 0 33 1 0 0 1 0 0 34 1 0 0 1 0 0 35 1 0 0 1 0 0 36 1 0 0 1 0 0 37 1 0 0 1 0 0 38 1 0 0 1 0 0 39 1 0 0 1 0 0 40 1 0 0 1 0 0 41 1 0 0 0 1 0 42 1 0 0 0 1 0 43 1 0 0 0 1 0 44 1 0 0 0 1 0 45 1 0 0 0 1 0 46 1 0 0 0 1 0 47 1 0 0 0 1 0 48 1 0 0 0 1 0 49 1 0 0 0 1 0 50 1 0 0 0 1 0 51 1 0 0 0 0 1 52 1 0 0 0 0 1 53 1 0 0 0 0 1 54 1 0 0 0 0 1 55 1 0 0 0 0 1 56 1 0 0 0 0 1 57 1 0 0 0 0 1 58 1 0 0 0 0 1 59 1 0 0 0 0 1 60 1 0 0 0 0 1

lm.rrpp(y~group/subject + subject, data = df)$LM$X Warning: This is not an error! It is a friendly warning.

Because variables in the linear model are redundant, the linear model design has been truncated (via QR decomposition). Original X columns: 12 Final X columns (rank): 6 Check coefficients or degrees of freedom in ANOVA to see changes.

Use options(warn = -1) to turn off these warnings.

(Intercept) group2 subject2 subject3 subject4 subject5 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 0 0 5 1 0 0 0 0 0 6 1 0 0 0 0 0 7 1 0 0 0 0 0 8 1 0 0 0 0 0 9 1 0 0 0 0 0 10 1 0 0 0 0 0 11 1 0 1 0 0 0 12 1 0 1 0 0 0 13 1 0 1 0 0 0 14 1 0 1 0 0 0 15 1 0 1 0 0 0 16 1 0 1 0 0 0 17 1 0 1 0 0 0 18 1 0 1 0 0 0 19 1 0 1 0 0 0 20 1 0 1 0 0 0 21 1 0 0 1 0 0 22 1 0 0 1 0 0 23 1 0 0 1 0 0 24 1 0 0 1 0 0 25 1 0 0 1 0 0 26 1 0 0 1 0 0 27 1 0 0 1 0 0 28 1 0 0 1 0 0 29 1 0 0 1 0 0 30 1 0 0 1 0 0 31 1 1 0 0 1 0 32 1 1 0 0 1 0 33 1 1 0 0 1 0 34 1 1 0 0 1 0 35 1 1 0 0 1 0 36 1 1 0 0 1 0 37 1 1 0 0 1 0 38 1 1 0 0 1 0 39 1 1 0 0 1 0 40 1 1 0 0 1 0 41 1 1 0 0 0 1 42 1 1 0 0 0 1 43 1 1 0 0 0 1 44 1 1 0 0 0 1 45 1 1 0 0 0 1 46 1 1 0 0 0 1 47 1 1 0 0 0 1 48 1 1 0 0 0 1 49 1 1 0 0 0 1 50 1 1 0 0 0 1 51 1 1 0 0 0 0 52 1 1 0 0 0 0 53 1 1 0 0 0 0 54 1 1 0 0 0 0 55 1 1 0 0 0 0 56 1 1 0 0 0 0 57 1 1 0 0 0 0 58 1 1 0 0 0 0 59 1 1 0 0 0 0 60 1 1 0 0 0 0

These matrices look different but estimate the same thing. The factor “group” is used but the nesting of subjects does not make sense, as they are unique to groups. The order of factors made a difference. In the first case, group was removed and in the second case, subject 6 is removed, due to redundancies. For ANOVA, it does not matter. For prediction, it does.

Please download the updated version of RRPP from github and let me know if you still have an error. If you have no error but have strange results, it will be hard to circumvent with removed parameters. Maybe manipulating the variable order can be used to trick R, but that is about the best one can do.

Good luck!

Mike

On Feb 16, 2023, at 9:47 PM, Mike Collyer @.***> wrote:

This problem is caused by the resulting, and sometimes illogical, model matrix column names after QR decomposition. You lost six columns because of redundant columns in the model matrix and while that is not a problem, if the resulting column names omit one of the column names from a new model matrix based on Litter factor levels, the attempt to match parameters fails. This is not something you can control. It is an operational issue that I have to fix. This issue seems to have emerged recently, so I believe it has something to do with R’s method of labeling columns in their qr function.

I will try to find a new solution but it might take some work. Please be patient.

Mike

On Feb 16, 2023, at 5:51 PM, jmlayton @.***> wrote:

I am using a full rank dataframe that looks like: https://user-images.githubusercontent.com/91229141/219503467-0279daab-335b-424a-9e5e-851775253698.png All predictors are of class "factor". The model is:

test <- RRPP::lm.rrpp(Simpson ~ DSF_groups + date_col_grps + parity_groups + gender + Litter, data=SAM_NA.rm, SS.type="III", print.progress = FALSE, iter=1000)

which when run prints: https://user-images.githubusercontent.com/91229141/219503752-74fb9ff8-1bb8-43e5-8ce5-beadd2e0aa6b.png I want to obtain the LSMs of the litter factor in my model. So, to build my "newdata" dataframe I run:

Litdf <- data.frame(Litter= as.character(sort(unique(SAM_NA.rm$Litter))))

rownames(Litdf) <- Litdf$Litter Which makes a data.frame that looks like: https://user-images.githubusercontent.com/91229141/219504076-f3e0be2e-899d-46ea-a380-66982fcd3761.png My predict() function is coded as: predict(test, Litdf)

And return the following error: https://user-images.githubusercontent.com/91229141/219504435-4212d335-8c90-436c-b9be-89405ff1f450.png It's also interesting the the same script runs with a mixed model that is just Simpson ~ gender + Litter with no other changes.

Is there anything obvious I have missed in my code or model?

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4WAV2KVRCWTC53Z6XTWX2VNXANCNFSM6AAAAAAU6YCRX4. You are receiving this because you are subscribed to this thread.

jmlayton commented 1 year ago

Thanks Mike! Thats was a massive help. I discovered nesting issues with my data set after digging into it a bit. I have a new model that is functional with predict. Are you able to confirm that the predict() function will provide the Least Square Means of "newdata", without any additional steps? I have seen it suggested in your documentation, but a confirmation would be wonderful.

mlcollyer commented 1 year ago

I’ll try to answer this delicately. Simple answer, “Yes!” The error was caused by an attempted matching of parameter names, but because of an unanticipated loss of certain parameter names, there was a mismatch error. This was easy to overcome once the possibility of parameter name not existing in the model fit but could exist in the model matrix of a new data frame was obvious. So, that solution should be okay.

But a broader question is what are LS means and what do they mean? This is a term made famous by SAS software and might be called elsewhere, perhaps with better terminology, “marginal means”. Marginal means are different than, say, "group means”, because they are based on the conditional responses with respect to other variables. For example, as an exercise, here is the RRPP pupfish example from lm.rrpp:

data(Pupfish) fit <- lm.rrpp(coords ~ log(CS) + Sex*Pop, SS.type = "I",

  • data = Pupfish, print.progress = FALSE, iter = 999)

That is a slightly complicated model. Here are the first few values for model parameters:

fit$LM$X[1:3,] (Intercept) log(CS) SexM PopSinkhole SexM:PopSinkhole 1 1 4.154712 0 0 0 2 1 4.189849 0 0 0 3 1 4.178185 0 0 0

When we estimate LS means, we start with a basic principle. If we average out all parameters, the model shoudl estimated the mean. Here is an example to show this is true:

x <- colMeans(fit$LM$X) # average of all model paramters B <- coef(fit) # model coefficients x %*% B # prediction

        [,1]        [,2]        [,3]        [,4]       [,5]

[1,] -0.03947932 -0.07058758 -0.04146458 -0.03712504 0.09922309 [,6] [,7] [,8] [,9] [,10] [1,] -0.07049983 -0.1740486 -0.002957713 0.04114024 0.05831862 [,11] [,12] [,13] [,14] [,15] [1,] 0.1639149 0.02669391 0.2288507 0.01808171 0.2260918 [,16] [,17] [,18] [,19] [,20] [1,] -0.04801046 0.139466 -0.05252247 -0.1324619 -0.0001930238 [,21] [,22] [,23] [,24] [,25] [1,] -0.159502 0.00806991 -0.1474583 0.01609122 -0.1338297 [,26] [,27] [,28] [,29] [,30] [1,] 0.0232014 -0.1194645 0.02907278 -0.1046493 0.03450391 [,31] [,32] [,33] [,34] [,35] [1,] -0.08994229 0.0395097 -0.07434372 0.04479319 -0.05872061 [,36] [,37] [,38] [,39] [,40] [1,] 0.04954165 -0.04274082 0.05352692 -0.02598984 0.05648326 [,41] [,42] [,43] [,44] [,45] [1,] -0.008025095 0.05898195 0.01017487 0.05963072 0.02772749 [,46] [,47] [,48] [,49] [,50] [1,] 0.05925944 0.05916601 0.05375747 0.07445374 0.0489664 [,51] [,52] [,53] [,54] [,55] [1,] 0.0894286 0.04428606 0.1041109 0.0397228 0.1188267 [,56] [,57] [,58] [,59] [,60] [1,] 0.03558288 0.1337509 0.03190393 0.148857 0.02911721 [,61] [,62] [,63] [,64] [,65] [,66] [1,] 0.179124 0.0241227 0.194327 0.02136462 0.2103038 0.01896472 [,67] [,68] [,69] [,70] [,71] [1,] 0.2136785 -0.04725024 0.2003582 -0.04734453 0.1867839 [,72] [,73] [,74] [,75] [,76] [1,] -0.04822661 0.1722217 -0.04936049 0.1571274 -0.05064406 [,77] [,78] [,79] [,80] [,81] [1,] -0.05505139 -0.00244577 -0.05289621 -0.01294389 -0.05428773 [,82] [,83] [,84] [,85] [,86] [1,] -0.02536192 -0.06012081 -0.03814639 -0.06708141 -0.04970174 [,87] [,88] [,89] [,90] [,91] [1,] -0.07499675 -0.05976109 -0.08502768 -0.06742116 -0.0962085 [,92] [,93] [,94] [,95] [,96] [1,] -0.07103283 -0.1087612 -0.07010192 -0.1174578 -0.06619454 [,97] [,98] [,99] [,100] [,101] [1,] -0.1316305 0.0178822 -0.1192173 0.01503803 -0.1115179 [,102] [,103] [,104] [,105] [,106] [1,] 0.003145581 -0.1164532 -0.0119135 -0.1328963 -0.0194042 [,107] [,108] [,109] [,110] [,111] [1,] -0.1476051 -0.01407205 -0.1516787 0.0005246936 -0.1440986 [,112] [1,] 0.01308347

colMeans(Pupfish$coords) # finding the mean vector [1] -0.0394793211 -0.0705875784 -0.0414645817 -0.0371250391 [5] 0.0992230875 -0.0704998342 -0.1740485656 -0.0029577128 [9] 0.0411402396 0.0583186233 0.1639148785 0.0266939073 [13] 0.2288506637 0.0180817104 0.2260918334 -0.0480104636 [17] 0.1394659947 -0.0525224675 -0.1324619220 -0.0001930238 [21] -0.1595020385 0.0080699100 -0.1474582752 0.0160912250 [25] -0.1338296738 0.0232013950 -0.1194644563 0.0290727782 [29] -0.1046492864 0.0345039120 -0.0899422885 0.0395097010 [33] -0.0743437152 0.0447931912 -0.0587206120 0.0495416472 [37] -0.0427408154 0.0535269216 -0.0259898394 0.0564832627 [41] -0.0080250954 0.0589819522 0.0101748738 0.0596307155 [45] 0.0277274882 0.0592594422 0.0591660108 0.0537574665 [49] 0.0744537395 0.0489664039 0.0894286031 0.0442860577 [53] 0.1041108508 0.0397228007 0.1188266716 0.0355828841 [57] 0.1337509086 0.0319039301 0.1488570143 0.0291172097 [61] 0.1791240377 0.0241226956 0.1943270360 0.0213646226 [65] 0.2103038328 0.0189647235 0.2136785251 -0.0472502389 [69] 0.2003582230 -0.0473445318 0.1867839192 -0.0482266136 [73] 0.1722216757 -0.0493604936 0.1571274066 -0.0506440585 [77] -0.0550513880 -0.0024457698 -0.0528962107 -0.0129438938 [81] -0.0542877287 -0.0253619177 -0.0601208097 -0.0381463854 [85] -0.0670814090 -0.0497017415 -0.0749967518 -0.0597610921 [89] -0.0850276765 -0.0674211632 -0.0962085041 -0.0710328305 [93] -0.1087612277 -0.0701019199 -0.1174578158 -0.0661945445 [97] -0.1316304588 0.0178822012 -0.1192172681 0.0150380297 [101] -0.1115179064 0.0031455813 -0.1164532391 -0.0119135000 [105] -0.1328962983 -0.0194041993 -0.1476050958 -0.0140720548 [109] -0.1516786689 0.0005246936 -0.1440985700 0.0130834733

OK, so what does the predict.lm function do to find LS means? It finds the means of parameters like we did above (x), concatenates these for the levels of prediction, and then does not average the parameters used for prediction. For example, if we wanted to find LS means for Sex, since there are two levels, we do this:

rbind(x, x) (Intercept) log(CS) SexM PopSinkhole SexM:PopSinkhole x 1 4.284101 0.4814815 0.462963 0.2407407 x 1 4.284101 0.4814815 0.462963 0.2407407

(which incidentally tells us that 48.15% of the sample comprises males) and we update that matrix to look like this:

(Intercept) log(CS) SexM PopSinkhole SexM:PopSinkhole x 1 4.284101 0 0.462963 0.2407407 x 1 4.284101 1 0.462963 0.2407407

So male and female LS means would be estimated holding all other parameters constant at their means.

Thus, to answer your question, the predict function will work as it should provided the parameters in fit$LM$X are what they should be. Let’s say, for example, that some of the populations sampled for the pupfish data had only males or only females. Then having parameters for population, sex, and their interaction would be redundant and, like with your data, some parameters in X would be removed. In the example in the previous email I showed how variable order can make a difference for which parameters are removed.

If you check the parameters in $LM$X from your model fit (the column names) and they make sense with respect to your new data, you can be comfortable with the LS means you estimate. But you might have to consider variable order to see if it make sense. I do not know of an easy way to program the function to anticipate and fix this. I might be able to one day come up with something better, but for now, this is the best advice I have. This is why I suggested putting Litter first in the order — that will make the parameters make sense.

I hope this is helpful. It is an area of statistics I think most researchers might not appreciate and can get away with not worrying about when they have research designs with full replication of factor levels.

Best, Mike

On Feb 20, 2023, at 10:15 AM, jmlayton @.***> wrote:

Thanks Mike! Thats was a massive help. I discovered nesting issues with my data set after digging into it a bit. I have a new model that is functional with predict. Are you able to confirm that the predict() function will provide the Least Square Means of "newdata", without any additional steps? I have seen it suggested in your documentation, but a confirmation would be wonderful.

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/7#issuecomment-1437177075, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4QXAPON3OJ3IKC3YDDWYODCBANCNFSM6AAAAAAU6YCRX4. You are receiving this because you commented.