Open cdlawless opened 2 years ago
This is indeed a critical bug related to this discussion. The developer must use getWithCols(x, i)
instead of x[[i]]
.
OK, it's not related to this, but I would still suggest to be careful, this will lead to an error for more complex datasets.
The issue might actually not be related to this (or not only to this) and requires more investigation.
@cdlawless - could it be that some of the factors you are using in your formula have more levels than you intend to?
Hi,
I have a rather complex experimental design (please find the table with colData attached). I am trying to run
pe <- msqrob(object = pe, i = "proteinRobust", formula = ~batch*infection*virus*time, overwrite=TRUE)
and get this error:
Error in if (m == 0) { : missing value where TRUE/FALSE needed
When I exclude "virus" from the formula, everything works ok (but I cannot do this, obviously). I also set up the levels in the beginning:
colData(pe)$batch <- a$batch %>% factor(levels = c("B12", "B16"))
colData(pe)$infection <- a$infection %>% factor(levels = c("non-inf", "infected"))
colData(pe)$virus <- a$virus %>% factor(levels = c("mock", "Flu_PR8_wt", "Flu_PR8_delNS1", "Flu_SC35M_wt", "Flu_SC35M_delNS1", "THOV_wt", "THOV_delNS1"))
colData(pe)$time <- a$timepoint %>% factor(levels = c("t1", "t2", "t3", "t4", "t5", "t6"))
colData(pe)$rep <- a$rep %>% factor()
but this should work fine.
Would really appreciate your opinion on this
Hi!
This maybe a statistics problem. Your variable infection, virus and batch are not identifiable. You may need to drop batch and/or infection from your model. Let me know if that doesn't work.
Best wishes, Olly
I don't quite understand what you mean by "not identifiable". I do need them for the contrasts. It will run (somehow) if I remove both "batch" and "infection", but then in the next step
getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[2]])
getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[1]])
I get
[1] NA
"Virus" works only if it's the only parameter (~virus)
The design you have doesn't allow you to infer certain groups. You cant differentiate between infected and non-infected at the same time as virus because you don't observe non-infected and virus. You may need to fit seperate regressions if you are interested in those contrasts.
I cannot agree completely, but let's imagine I remove levels of infection and batch. The thing is that even simple ~virus*time
doesn't work (with ridge and without it). With ridge=FALSE
it runs and returns NA, and with ridge=TRUE
it takes forever to run (3 days and then I just interrupted the process).
Given that some version of model work and not other, I dont think its a problem with the package exactly though it may be too slow for you data. I suggest you try a few tests to see where the problems are (as it could be in many places:)
lm
function in R rlm
function from the MASS packageFrom here we can assess whether is one of the following 1) a modelling issues 2) a stats issue 3) a coding issue - otherwise we dont know right now.
Thanks!
also looping in @StijnVandenbulcke
Hello,
lmfit
function works well with model.matrix(~virus*time + batch)
.
lm
for 1 selected protein also worked without errors
rlm
gives an error
'x' is singular: singular fits are not implemented in 'rlm'.
Imputation solves this issue
Running msqrob with 1 timepoint still returns NA in getCoef
Do you think this is because of a larger number of missing values for some conditions, or the model is "overwhelmed" with too many comparisons?
Hi,
I'm getting the following error when running
I get:
I can get around it somewhat by add the matrix_pH_group as a random effect, as in:
But there are n<5 repeats in that grouping factor.
Let me know if you need more information from me to help with this.