zqfang / GSEApy

Gene Set Enrichment Analysis in Python
http://gseapy.rtfd.io/
BSD 3-Clause "New" or "Revised" License
548 stars 114 forks source link

FDR q-val does not preserve ordering of nominal p-values #208

Closed joshscurll closed 1 year ago

joshscurll commented 1 year ago

I'm wondering how GSEApy computes FDR q-val and FWER p-val for the gsea module. I have found that FDR q-values do not preserve the ordering of p-values. Also, the FDR q-values are larger than the FWER p-values much more often than I would expect considering that controlling the FWER is much more stringent than controlling the FDR. The observation that FDR changes the p-value ordering is immediately apparent from the dataframe head in code cell 40 in Section 2.4.1 of the GSEApy documentation ("GSEApy Example"). Calculating FDR q-values and FWER p-values directly from "NOM p-val" myself using statsmodels.api.multipletests does not present these issues.

I am using GSEApy v1.0.4.

joshscurll commented 1 year ago

Never mind -- I read the methods in the GSEA PNAS paper and realized that the FDR is not computed from the nominal p-values but is rather estimated directly when computing enrichment scores. I assume the same is true in GSEApy.

zqfang commented 1 year ago

Yes, FDR q are not from nominal p-values