zqfang / GSEApy

Gene Set Enrichment Analysis in Python
http://gseapy.rtfd.io/
BSD 3-Clause "New" or "Revised" License
564 stars 117 forks source link

Genes in Lead_genes and heatmap are wrong when using gene_set permutation #190

Closed kyungtaekLIM closed 1 year ago

kyungtaekLIM commented 1 year ago

Enriched terms seem fine. However, gene list in the resulting csv (Lead_genes) and heatmap are weird only when permutation_type = gene_set.

Name,Term,ES,NES,NOM p-val,FDR q-val,FWER p-val,Tag %,Gene %,Lead_genes
gsea,mitotic cell cycle (GO:0000278),-0.8167602734400491,-3.2893272557339803,0.0,0.0,0.0,253/396,8.69%,MT-ND4L;MT-ATP8;MT-RNR2;MT-RNR1;CICP1;AC012005.2;NEFLP1;KDM5D;TXLNGY;BCORP1;ELOCP14;HSFY2;AC140113.3;CDC27P2;RN7SKP282;LINC00278;WASH6P;WASIR1;TRPC6P;AC234781.5;AC234781.3;AC234781.2;FUNDC2;F8;MPP1;OR3B1P;ATF4P1;FAM223A;UBL4A;GDI1;ATP6AP1;RPL10;TKTL1;OPN1MW3;TMEM187;RENBP;AVPR2;ABCD1;SLC6A8;RPL18AP16;ZNF275;PNMA5;MAGEA2B;MAGEA3;MAGEA5;GABRE;HMGB3;MTMR1;CXorf40B;IDSP1;AC244197.3;SLITRK2;MTND2P39;LDOC1;RNU6-3P;ADGRG4;FHL1;CT45A5;CT45A3;INTS6L-AS1;ZNF449;AC234771.2;CT55;SMIM10L2B-AS1;MOSPD1;FAM122C;PHF6;FO393409.1;RPSAP63;MBNL3;RAP2C-AS1;FRMD7;STK26;PNKDP1;ENOX2;RBMX2;SASH3;XPNPEP2;APLN;TEX13C;AL121601.2;FERP1;HSPA8P1;PA2G4P1;RHOXF2;NKAPP1;UPF3B;UBE2A;SLC25A5-AS1;RNU1-67P;AC004000.1;PGRMC1;ZCCHC12;IL13RA1;RNU6-1323P;AC005000.1;LRCH2;QTRT1P1;TRPC5OS;LINC00890;AL512882.1;IRS4;COL4A5;COL4A6;AC234782.5;TCEAL3-AS1;NXF3;MTND6P32;MTCO3P19;AL035551.1;FOXN3P1;TMSB15A;ARMCX3;GLA;Z97985.1;XKRX;HNRNPA1P27;CSTF2;SYTL4;SRPX2;FAM133A;ZNF711;APOOL;CHMP1B2P;MAGT1;FGF16;TTC3P1;RLIM;Z83843.1;FTX;XIST;SEPHS1P4;MAP2K4P1;PABPC1L2A;DMRTC1B;PHKA1;NHSL2;INGX;NONO;SOCS5P4;DLG3;IGBP1-AS1;IGBP1-AS2;CNOT7P1;PJA1;HEPH;CCT4P2;MTMR8;BTF3P8;AL022157.1;AL139397.2;RRAGB;FGD1;SMC1A;IQSEC2;RNU6-421P;RN7SL262P;TIMM17B;WAS;MRPL32P1;TBC1D25;AL022578.1;UXT;LINC01545;RP2;SLC9A7;CHST7;GAPDHP65;AC234772.3;FUNDC1;GPR34;USP9X;TSPAN7;SYTL5;NR0B1;VENTXP1;CBX1P2;AC078993.1;CA5BP1;VEGFD;MOSPD2;GPM6B;OFD1;EGFL6;EIF5P1;AC120338.1;AC110995.1;PRKX-AS1;LINC00106;Z97192.2;GRAMD4;PRR5;KIAA1644;BIK;POLDIP3;SLC25A5P1;PHF5A;MKL1;RPS19BP1;MGAT3;AL020993.1;LGALS1;Z94160.2;CYTH4;AL022313.2;AL008635.1;RFPL3S;SLC5A1;RNF185;SEC14L6;ASCC2;AP000354.1;BCRP8;ASH2LP1;PRAMENP;AC007731.1;PRODH;E2F6P1;HDHD5-AS1;SUMO3;AP001065.3;AP001065.1;AP001057.1;H2BFS;DSCR9;TTC3;CBR3-AS1;MRPS6;LINC00649;IFNAR2;AP000281.2;OR7E23P;EVA1C;FDX1P2;MIR3648-1;CU634019.2;FP236240.4;CU639417.1;SAMD10;TNFRSF6B;HELZ2;AL160412.1;CASS4;FAM210B;ARPC3P1;SNX21;AL117382.2;CHD6;AL022394.1;LBP;AL121895.2;PROCR;EIF2S2

Obvious most of the leading edge genes are not related with mitotic cell cycle and do not exist in the gmt file.

When permutation_type = phenotype, I could see the right genes.

Name,Term,ES,NES,NOM p-val,FDR q-val,FWER p-val,Tag %,Gene %,Lead_genes
gsea,mitotic cell cycle (GO:0000278),-0.8167602734400491,-1.7463210214207217,0.11022044088176353,0.0947555639823332,0.628,254/396,8.81%,CDC25A;PLK4;PTTG1;NCAPG2;CDC6;CENPI;CDK2;RRM2;ZWINT;LIN9;CENPO;CDC45;CEP76;KIF18B;DSN1;UBE2C;TYMS;SKA1;NCAPG;PRIM1;CENPL;ESCO2;NCAPH2;ODF2;CKS1B;NDC80;CENPN;RBL1;POLE2;SPDL1;CEP57;MCM10;HIST1H4A;CENPK;ORC6;POLD1;E2F1;RPA3;RFC4;CDK1;AURKB;SPC24;GMNN;CSNK1E;KIF23;FOXM1;RFC3;NUF2;GINS4;CDK5RAP2;PCNA;NUP54;TUBGCP3;XRCC2;GINS1;NUP205;CCNB1;CENPE;RPA2;CENPM;CCNE2;CLASP2;SPC25;KNTC1;MAD2L1;BUB1;FBXO5;ESPL1;ORC1;CENPU;E2F2;CDKN2C;BUB3;CEP152;SMC4;CDCA8;RFC5;POLE;PKMYT1;MCM8;NDE1;RAD21;NUP107;CENPW;RFC2;MYBL2;WEE1;NCAPH;KIF2C;CCNB2;STAG1;DCTN3;NUP153;LIG1;SMC3;POLA2;NUP85;CDC20;NUP37;BUB1B;LIN52;CENPH;PDS5B;MCM7;SKP2;TPR;VRK1;CDT1;FEN1;CEP72;NCAPD3;NEK2;CENPP;MCM2;DBF4;SMC2;NUP43;PLK1;CENPC;GORASP1;PRIM2;CDC25C;SKA2;ZWILCH;POLA1;ERCC6L;TFDP1;HAUS2;BIRC5;CENPQ;ANAPC10;NUMA1;NUP35;LIN54;INCENP;CDCA5;CCNA2;MCM6;CENPF;E2F8;MCM3;RRM1;SMC1A;MCM4;CDK4;KIF20A;XPO1;CENPA;ZW10;RAE1;KIF18A;CEP63;CSNK2B;TOP2A;NUP98;CCNE1;GINS2;PHF13;PSMB2;SEH1L;PSMD9;AURKA;CENPJ;ANAPC5;NUP50;CEP41;RBBP4;RCC2;MCM5;BORA;DHFR;NUP88;PSMD14;ORC3;CDC23;RANGAP1;ANAPC11;TUBG1;DNA2;MASTL;TUBGCP5;CDC7;RFC1;CDC25B;TUBB;E2F4;NUP160;PSMC3;NUP155;CEP78;NUP210;PSMB3;PDS5A;NUDC;RPA1;PCM1;CCNH;PPP2R2A;CUL1;CEP192;POLD3;MIS12;ITGB3BP;NUP62;NEDD1;PSMA4;NUP188;NSL1;ANAPC1;PSMA2;ANAPC7;TUBGCP4;FZR1;POM121;PCNT;MCPH1;SEC13;PHF8;ORC2;HSP90AA1;PSMD12;PSMD1;CEP135;NCAPD2;ANAPC4;CENPT;RB1;PSMA3;CKAP5;HDAC1;CNTRL;CCP110;PLK3;PSME4;RANBP2;PSMC5;PSMB1;SKP1;DYNLL1;SSNA1;MAU2;SET;CEP70;NUP133;POLD2;FGFR1OP;PSMC1;AAAS;PSMD11;YWHAE;PSMD4;PSMA7;PSMC2;ANKLE2

I'm using GSEApy v1.0.3.

Thank you for your great contribution and support.

K Lim

kyungtaekLIM commented 1 year ago

This modification seems to work for the heatmap.

https://github.com/kyungtaekLIM/GSEApy/commit/3ea60eb0ddfa8f995ba9ca7f6f992b7ac5fc3631

zqfang commented 1 year ago

Thank you very much for reporting this bug. I've changed the code to be more readable and maintainable to me.

I'll release a bug fixed version ASAP

zqfang commented 1 year ago

v1.0.4 has been released to fix this issue

kyungtaekLIM commented 1 year ago

Thanks for your prompt fix! I will try the new version.

zqfang commented 1 year ago

No problem at all!

kyungtaekLIM commented 1 year ago

Thanks! It works!