statOmics / msqrob2

Implementation of the MSqRob analysis of differentially expressed proteins using the Features infrastructure
9 stars 10 forks source link

MSqRobHurdle warnings #46

Open JobbeG opened 1 year ago

JobbeG commented 1 year ago

Dear authors

While running the Hurdle model, I get a warning that non-integer counts are used in a glm, are you sure the underlying representation correctly uses counts and does not directly work on the intensity measures? If the latter is not the case, then do you perhaps have any idea why these warnings are produced?

Additionally, when I subsequently do hypothesis testing on the resulting HurdleObjects, degrees of freedom are estimated as being infinite or 99,2678 with no exception. To me, this seems highly implausible since the degree of missingness is highly variable throughout my dataframe.

Lastly, I'm having an interpretational issue. If I understand it correctly, p-value adjustment is only done on the joint p-value. However, in the original publication, it is mentioned that you can differentiate between differentially detected, differentially abundant or both. This is however not obvious from my final object. Additionally, I was wondering if implementing BH-correction for both model components and interpreting them separately would be wrong.

Best regards and thanks in advance for your answers,

Jobbe

ococrook commented 1 year ago

Hi Jobbe,

Thanks for reaching out! The hurdle model uses both intensity and counts. THe second error maybe related to the first one.

There used to be an additional workfow that would help here https://statomics.github.io/msqrob2Examples/. @StijnVandenbulcke - do you know where it went?

Indeed, the documentation at the moment isn't clear. We'll try and improve that.

You implement BH-correction for the model components sepeartely is not wrong if you carefully interpret the null hypothesis or the set of hypothesis you are testing. You only need to do BH-correction for tests you actually do with the same null.

Best wishes, Olly

JobbeG commented 1 year ago

Well, it's not exactly an error, just a warning. I just wonder if the final solution is trusthworthy or that it internally rounded the intensities to integers rather than using actual peptide counts. I especially wonder about that since the imported object doesn't seem to have it. Also the code doesn't seem to refer to this columns in my maxquant file. I've added the code underneath to further clarify.

Thanks for your answer so for and thanks for further clarifying.

Best regards

Jobbe

peptidesFile <- "C:/Users/jobbe/Documents/Data_analyse/Takeda processed files/MSQROBorg/peptides7.txt"

ecols <- MSnbase::grepEcols( peptidesFile, "Intensity ", split = "\t")

pe <- readQFeatures( table = peptidesFile, fnames = 1, ecol = ecols, name = "peptides", sep="\t")