statOmics / msqrob2

Implementation of the MSqRob analysis of differentially expressed proteins using the Features infrastructure
8 stars 9 forks source link

How to use the aggregateFeatures and msqrob functions #38

Open mbonhomme opened 1 year ago

mbonhomme commented 1 year ago

Dear statOmics Team,

Thank you for the great msqrob2 package. Msqrob2 is built for proteomics experiments but seems to be a powerful tool to analyze my metabolomic data. I am currently using your code on my metabolomic data and having some questions to adapt it as best as possible.

I am starting directly with a matrix with multiple features raw intensities (in rows) for my different sample (in columns). I log-transformed and normalized the data. I provided informations about my experimental conditions with colData. My question is about the Summarization to protein level step. Unlike proteomic, I do not have a assay about protein expression value and would like to continue working with the same pe with my features intensities to build the model fitting my design. How should I use this aggregateFeatures function in order to continue the analysis?

I thank you in advance; any help or explanation will be appreciated. I can send more information if needed. I will be very grateful if you could help me with this.

Regards,

lgatto commented 1 year ago
mbonhomme commented 1 year ago

Thank you very much for that very quick help! could you explain me more about the levels? What is excessive ? Group has 2 levels, Time has 2 levels and patient has 11 level (one for each patient, they are repeated measures on Time, not in group; total of 22 samples).

thank you for helping, really appreciated

Regards,

lgatto commented 1 year ago

x2 below has 3 levels (because it is a subset of x), but only two of the three are left. If you referred to x2 in the formula, it would lead to that very same error.

> x <- factor(LETTERS[1:3])
> x
[1] A B C
Levels: A B C
> x2 <- x[1:2]
> x2
[1] A B
Levels: A B C
ococrook commented 1 year ago

If your formula is

formula = ~ group*time + patient

You are actually fitting

formula = ~ group + time + group:time + patient

You don't have enough samples to fit this model I don't think. Are patients nested within groups? You may want one of the following but I don't know enough about your question, model or data to be more helpful

formula = ~ group + time + group:time + (group|patient) formula = ~ group + time + group:time + (time|patient) formula = ~ group + time + group:time + (1|patient)

mbonhomme commented 1 year ago

Thank you very much for you answers, it s really helpful.

I see the issue now. I have 11 patients divided in 2 (unbalanced, n=6 and n=5) groups, they are measured at 2 time points. I am investigating the differential expression between the 2 time point for the patient in group 1 and group 2, I would like to take into account that the measures are paired for the time (but not for the group: different patient in Gp1 vs. Gp2). The appropriate formula seems to be : formula = ~ group + time + group:time + (1|patient). Correct? However, msqrob does not seem to accept this formula.

This is how I created my colData

colData(pe)$patient <- rep(c("P1","P1","P2","P2","P4","P4","P7","P7","P8","P8","P9","P9","P10","P10","P11","P11","P3","P3","P5","P5","P6","P6")) %>% as.factor => 11 levels

colData(pe)$time <- rep(c("T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2"))%>% as.factor =>2 levels

colData(pe)$group <- rep(c("Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2"))%>% as.factor =>2 levels

Thank you very much, Have a nice day,

Regards,

mbonhomme commented 1 year ago

Hello, I am sorry to come back with this topic, I am still unable to fit the model I need. Maybe someone here can see my mistake (details are in the previous reply) ? It is now probably more a statistical question rather than a question related to your package. But my formula "= ~ group + time + group:time + (1|patient)" does not work with msqrob.

Thanks in advance,

ococrook commented 1 year ago

@StijnVandenbulcke Do you have time to have a look at this?

StijnVandenbulcke commented 1 year ago

That formula is used for mixed models. In order to use this with msqrob2 you will have to set ridge = TRUE. Currently you cannot use mixed models without ridge regression, even though this is recommended we will update the package so that you can use mixed models without ridge regression.

If you plan to use the ridge regression, you should use this branch as this includes an important fix.