statOmics / msqrob2

Implementation of the MSqRob analysis of differentially expressed proteins using the Features infrastructure
9 stars 10 forks source link

msqrob error #33

Open cdlawless opened 2 years ago

cdlawless commented 2 years ago

Hi,

I'm getting the following error when running

se<-msqrob(object=pe, i="proteinRobust", formula= ~pH*matrix + matrix_pH_group, ridge=TRUE)

I get:

Error in if (m == 0) { : missing value where TRUE/FALSE needed

I can get around it somewhat by add the matrix_pH_group as a random effect, as in:

se<-msqrob(object=pe, i="proteinRobust", formula= ~pH*matrix + (1|matrix_pH_group), ridge=TRUE)

But there are n<5 repeats in that grouping factor.

Let me know if you need more information from me to help with this.

lgatto commented 2 years ago

This is indeed a critical bug related to this discussion. The developer must use getWithCols(x, i) instead of x[[i]].

OK, it's not related to this, but I would still suggest to be careful, this will lead to an error for more complex datasets.

lgatto commented 2 years ago

The issue might actually not be related to this (or not only to this) and requires more investigation.

lgatto commented 2 years ago

@cdlawless - could it be that some of the factors you are using in your formula have more levels than you intend to?

Aokht17 commented 10 months ago

Hi,

I have a rather complex experimental design (please find the table with colData attached). I am trying to run

pe <- msqrob(object = pe, i = "proteinRobust", formula = ~batch*infection*virus*time, overwrite=TRUE)

and get this error:

Error in if (m == 0) { : missing value where TRUE/FALSE needed

When I exclude "virus" from the formula, everything works ok (but I cannot do this, obviously). I also set up the levels in the beginning:

colData(pe)$batch <- a$batch %>% factor(levels = c("B12", "B16")) colData(pe)$infection <- a$infection %>% factor(levels = c("non-inf", "infected")) colData(pe)$virus <- a$virus %>% factor(levels = c("mock", "Flu_PR8_wt", "Flu_PR8_delNS1", "Flu_SC35M_wt", "Flu_SC35M_delNS1", "THOV_wt", "THOV_delNS1")) colData(pe)$time <- a$timepoint %>% factor(levels = c("t1", "t2", "t3", "t4", "t5", "t6")) colData(pe)$rep <- a$rep %>% factor()

but this should work fine.

Would really appreciate your opinion on this

coldata_pe.txt

ococrook commented 10 months ago

Hi!

This maybe a statistics problem. Your variable infection, virus and batch are not identifiable. You may need to drop batch and/or infection from your model. Let me know if that doesn't work.

Best wishes, Olly

Aokht17 commented 10 months ago

I don't quite understand what you mean by "not identifiable". I do need them for the contrasts. It will run (somehow) if I remove both "batch" and "infection", but then in the next step getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[2]]) getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[1]])
I get

[1] NA

"Virus" works only if it's the only parameter (~virus)

ococrook commented 10 months ago

The design you have doesn't allow you to infer certain groups. You cant differentiate between infected and non-infected at the same time as virus because you don't observe non-infected and virus. You may need to fit seperate regressions if you are interested in those contrasts.

Aokht17 commented 10 months ago

I cannot agree completely, but let's imagine I remove levels of infection and batch. The thing is that even simple ~virus*timedoesn't work (with ridge and without it). With ridge=FALSE it runs and returns NA, and with ridge=TRUE it takes forever to run (3 days and then I just interrupted the process).

ococrook commented 10 months ago

Given that some version of model work and not other, I dont think its a problem with the package exactly though it may be too slow for you data. I suggest you try a few tests to see where the problems are (as it could be in many places:)

From here we can assess whether is one of the following 1) a modelling issues 2) a stats issue 3) a coding issue - otherwise we dont know right now.

Thanks!

also looping in @StijnVandenbulcke

Aokht17 commented 10 months ago

Hello,

lmfit function works well with model.matrix(~virus*time + batch). lm for 1 selected protein also worked without errors rlm gives an error

'x' is singular: singular fits are not implemented in 'rlm'.

Imputation solves this issue

Running msqrob with 1 timepoint still returns NA in getCoef

Do you think this is because of a larger number of missing values for some conditions, or the model is "overwhelmed" with too many comparisons?