Open JobbeG opened 2 years ago
Extra info: apparently it could also be due to problems with dimensions not matching, however, I have 106 samples. I checked the dimensions of my Qfeatures object and it has the right dimensions. Furthermore I've checked under colData, and here I could also find 106 values.
I further checked the problem, and the problem disappears as soon as I drop my random effects, however, I would prefer to keep my random effects. They code badge effects for trypsin (6 levels) and badge effects for MS_batch (7 levels). Since it are many levels, it doesn't seem suited to code it as a bunch of fixed dummy variables. However, it doesn't seem to work with the standard lmer syntax, although it is the implementation within Msqrob if I'm not mistaken?
Hey Jobbe,
I'll try and take a look a this. Any chance you could post a small reproducible example of the error and a copy the output and your sessionInfo?
Do you get errors when you fit a different model?
Hey Oliver,
Well, I'm not allowed to share my dataset, but I can copy the code and I'll make a dummy dataset and annotation table to go with it. Is it okay if I also comment on some subsequent issues here? Or would you rather have that I open a new issue for that? The thing is that issues persist even when I ignore the random variables. Is there a place where I can find more on the prerequisites for running a hurdle model?
I can't seem to add them as files, I'll try copy the code here:
exp_annotation_xlsx <- openxlsx::read.xlsx("C:/Users/jobbe/Documents/Data analyse/Takeda processed files/MSQROB/Annotation_Table2.xlsx") pepHurdle <- MSnbase::grepEcols("C:/Users/jobbe/Documents/Data analyse/Takeda processed files/MSQROB/peptides7.txt", "Intensity ", split = "\t")
PH <- readQFeatures( table = "C:/Users/jobbe/Documents/Data analyse/Takeda processed files/MSQROB/peptides7.txt", fnames = 1, ecol = pepHurdle, name = "peptide", sep="\t")
colData(PH)$Age <- exp_annotation_xlsx$Age colData(PH)$Tissue_age <- exp_annotation_xlsx$Tissue_age colData(PH)$ExpGroup <- as.factor(exp_annotation_xlsx$Expgroup) colData(PH)$ExpGroupR <- as.factor(exp_annotation_xlsx$ExpgroupR) colData(PH)$Gender <- as.character(exp_annotation_xlsx$Gender) colData(PH)$MS_batch <- exp_annotation_xlsx$MS_batch colData(PH)$Trypsin_batch <- exp_annotation_xlsx$Trypsin_batch colData(PH)$Cause_of_death <- exp_annotation_xlsx$Cause_of_death colData(PH)$FullRunName <- exp_annotation_xlsx$FullRunName colData(PH)$order <- exp_annotation_xlsx$order
PH <- aggregateFeatures(PH, i = "peptideLog", fcol = "Proteins", name = "protein")
PH <- msqrobHurdle( PH, i = "protein", ~AgeExpGroupTissue_age + (1|Trypsin_batch)+(1|MS_batch), modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = TRUE, maxitRob = 40, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE)
getCoef(rowData(PH[["protein"]])$msqrobModels[[1]])
getCoef(rowData(PH[["protein"]])$msqrobHurdle[[1]])
L <- makeContrast( c( "ExpGroupPD - ExpGroupHC", ), parameterNames = rowData(PH[["protein"]])$msqrobModels[[2]] %>% getCoef %>% names )
L <- makeContrast( c( "ExpGroupPD - ExpGroupHC", ), parameterNames = rowData(PH[["protein"]])$msqrobHurdle[[2]] %>% getCoef %>% names )
As for the dataset, I made a randomized version of the dataset and the annotation table, which I can send if you want, the annotation table looks something like this:
The peptide files is just standard MaxQuant output and too big to screenshot.
I have done some digging in the functions, here I found that one of the problems that is encountered can be circumvented by changing the underlying linear algebra package to lapack rather than linpack. In this way it can run past the step behind the QR-decomposition. There it now leads to NA results since it declares that the number of parameters for the decomposition does not match the original number of parameters. Lapack does return the correct number of parameters after decomposition, however, the loop where it is parallelized no longer functions. Could it be that the implementation just doesn't work for dataset where a lot of factors are modelled simultaneously?
The problem concerns the msqrobHurdle model, I've been trying to run it for quite a while now and I keep on getting subscript too long errors, which if I'm not mistaken, points towards problems with memory overflow. However, I've got a machine with 16 GB of working memory and 8 cores and do not have access to an even more powerful laptop, is there a workaround?
Additionally I get a few errors every now and then that seem unrelated to this issue and I can't understand the logic behind them but it seems to have something to do with the datatypes of my random factors in my annotation table: This is the error I get when I append them as they were coerced by R, which is numeric: Error in if (m == 0) { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)
To circumvent it, I tried forcing them as factors. This make the code run up until the subscript too long error pops up, however, then I do get the warning: '|' is not meaninful for factors (I specified them as random effects with the following syntax: 1|randomfactor1 + 1|randomfactor.
Subsequently I tried forcing them to character strings, but this leads to error again.
Any help or clarification on where it goes wrong would be dearly appreciated.
Best regards
Jobbe