The order of factors/covariates in the design matrix of test_diff affects the final analysis

AlannaGSpiteri commented 2 months ago

Hi,

When running test_diff the order of factors/covariates in the design matrix affects the final analysis.

I have put this issue up on BioStars: https://www.biostars.org/p/9588610/#9602501 and it was suggested that there might be a bug in DEP2 or an issue with my code. Although, the same thing happens in TMT analyst, an online proteomics analysis platform (https://analyst-suites.org/apps/tmt-analyst/) which uses DEP.

This is the code i have used: design_formula <- formula(~0 + condition + PMI + Sex + Age + Duration) contrasts2 <- DEP2::test_diff(data_norm2, type = "manual", test = c("M_vs_C"), fdr.type = "BH", design_formula=formula(design_formula))

I have tested a bunch of iterations of the design formula. Using the same combination of factors/covariates but in a different order affected the number of significantly differentially expressed proteins.

In the image i have a summary table showing the number of significant proteins (proteinnum) and the smallest adjusted p val in the dataset (BH) for the different combinations. If you look at the second last and third last row in the table - the same covariates in a different order produce a different result.

Why might this be?

Thanks heaps

AlannaGSpiteri commented 2 months ago

@mildpiggy from previous threads it appears this issue is arising because you can only account for a single factor in the design matrix. It would be really great if the full functionality of limma could be implemented in DEP2. Is there an alternate way to include multiple covariates AND factors, as well as accounting for duplicate subjects using duplicateCorrelation() (as with limma) in DEP?

mildpiggy commented 1 month ago

Thank you very much for your discovery and reminder. DEP2 and DEP were originally designed for single-factor experiments. If multiple factors are involved, we are accustomed to using other custom interface functions to execute limma, rather than directly using test_diff. However, we have overlooked the fact that the order of factors in the design_formula of test_diff formula can affect the results, which is indeed a mistake. Your suggestion is very valuable, and I am going to add the handling of two factors in the update of DEP2.

mildpiggy / DEP2

The order of factors/covariates in the design matrix of test_diff affects the final analysis #20