Why is it possible for a p=0 term to have q=1?

zqfang / GSEApy

Gene Set Enrichment Analysis in Python

http://gseapy.rtfd.io/

BSD 3-Clause "New" or "Revised" License

564 stars 117 forks source link

Why is it possible for a p=0 term to have q=1? #162

Closed searchlie closed 1 year ago

searchlie commented 2 years ago

Hi, zqfang. I recently learned of your GSEApy and immediately tried prerank with the data I had on hand and compared it to results of the GSEA provided by Broad institute. Basically, I was very pleased with the results, as NES, p-value, FWER, etc. appeared to be roughly similar for both. However, upon closer inspection of the results, I noticed that only FDR (q value ) behaved strangely:

A striking example is a term with p=0 that has q=1 (left). This term had p=0 and q=0 in Broad's GSEA (right):

To investigate the issue in depth, I plotted the p and q values and discovered that the relationship can be V-shaped in GSEApy (left), but is roughly monotonic in Broad's results (right).

Is this a bug? If it is the intended specification, could you please tell me what could cause such a difference?

Setup

GSEApy version: 0.11.0 Python version: 3.9.6 OS: Debian 11.4

Used rnk file

Used option:

pre_res = gp.prerank(rnk=rnk, gene_sets="c2.cp.kegg.v7.5.1.symbols.gmt", min_size=5)

zqfang commented 2 years ago

Thanks for the comparasion ! I think it's a bug in the recent updated rust code. I will get back to you soon

zqfang commented 2 years ago

Now, it's fixed. see a new release (v0.12.1) for this critial bug fixed

just use pip

pip install -U gseapy

searchlie commented 2 years ago

Thank you for your prompt response! I updated gseapy and now confirmed that the p=0 term has a properly low q value. For reference, I have included some new figures below.

When comparing to Broad's GSEA, it still does not ride on the diagonal completely, but it does ride on the lower q-values, so it looks good for practical use. Very much appreciated!!!

zqfang commented 2 years ago

Thank you very much for the feeback. it really helps. So others could have a reference here.

there are 3 reason have you don't align well for FDR q:

(1) I used an approximate method to speed up the calcualtion of FDR q. which is not exactly the same implement as GSEA_R here. the more permutation num, the closer to GSEA_R orignial implementation values.

(2) GSEApy did not use an extra adjust FDRq step to trim FDR q values. ( So I believe GSEApy will have larger FDR q values)

(3) the permutation procedure is hard to make the comparsion fail between GSEA-P and GSEApy.

zqfang commented 2 years ago

Hi @searchlie, unfortunately, I found another critial bug when calculating NES values in Rust code. It affects the FDR calculation and Now is fixed

This bug does not exist for numpy version of gseapy (<=v0.10.8). Please update to 0.13.0 as soon as possible.

pip install -U gseapy

If you don't mind, you could set me the GSEA-P results, I could make a comparsion for FDRs

searchlie commented 2 years ago

Hi @zqfang. Thanks for the update! I immediately used v0.13.0 and compared its q value with the q value output by the Broad instiitute software. It seems to correlate definitely better than v0.12.0! I appreciate your fantastic job!

zqfang commented 2 years ago

Thanks for the comparsion, too. Yeah. It's my bad. I should notice the weird thing more earlier. I thought the previously result was caused by the approximate method I've implememented. It turns out that It was a mistake I've made a when converting numpy to Rust code.

searchlie commented 2 years ago

@zqfang This is not a bug report, but a report about a hidden change I noticed recently. In previous versions, the results (res2d) were sorted by ES or NES value, but in the latest version, they seemed to be sorted by FWER p-val value. Is my understanding correct? (I would like to check the code directly, but I can't read the Rust code...sorry.)

zqfang commented 2 years ago

in the previous version, the final output results are sorted by FDR. Now are sorted by the absolute value of NES, which will make the sorting order more deterministic.

The rust code only used for computing.

The sorting only affects the row orders in the output table. You could sort the table whatever you like.

searchlie commented 2 years ago

Very well understood, thank you!