zqfang / GSEApy

Gene Set Enrichment Analysis in Python
http://gseapy.rtfd.io/
BSD 3-Clause "New" or "Revised" License
548 stars 114 forks source link

several question in using GSEApy #89

Closed Gin-Wang closed 1 year ago

Gin-Wang commented 4 years ago

Hi,zqfang. Thanks for your GSEApy that i'am looking for There are some questions while i am using it and hope that you could help.

  1. I calculated test data (P53.txt) by c1.hallmark.gmt and signaltonoise using GSEApy and GSEA desktop v4.0 and here are the results. the question is why they have different NES score and pval and fdr value?

GSEApy:

Term es nes pval fdr
HALLMARK_E2F_TARGETS 0.372784 10.6263 0 0
HALLMARK_G2M_CHECKPOINT 0.435117 13.49521 0 0
HALLMARK_UV_RESPONSE_DN 0.362055 9.182312 0 0.000495
HALLMARK_GLYCOLYSIS 0.293665 9.207761 0 0.000619
HALLMARK_MITOTIC_SPINDLE 0.374997 9.388243 0 0.000825
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 0.271447 8.499713 0 0.009903

GSEA v4.0.1

NAME SIZE ES NES NOM p-val FDR q-val FWER p-val
HALLMARK_MITOTIC_SPINDLE 147 0.388054 1.574438 0.036822 0.811748 0.483
HALLMARK_PROTEIN_SECRETION 96 0.364037 1.353547 0.124031 1 0.897
HALLMARK_UV_RESPONSE_DN 141 0.356714 1.34159 0.130037 0.985496 0.908
HALLMARK_G2M_CHECKPOINT 172 0.428153 1.316254 0.223938 0.831212 0.933
HALLMARK_HEME_METABOLISM 164 0.294527 1.131991 0.221184 1 0.995
HALLMARK_NOTCH_SIGNALING 23 0.369969 1.130702 0.293594 1 0.995
HALLMARK_GLYCOLYSIS 161 0.267367 1.112832 0.3157 1 0.998
  1. I did this using command line in CentOS. I wonder if i can choose to write pngs with other statistic information like ES score.

  2. GSEA desktop version has a parameter chip platform that I could choose a .chip file so that I may not convert gene id to symbol before. Does GSEApy only accept uppercase gene symbol?

  3. When I was using GSEA desktop version, it uses half of my total memory and some big gmt files could not run successfully. Could GSEApy use more than half of my total memory automatically? Or I should modify some files?

Best, Thank you.

zqfang commented 4 years ago

Hi @Gin-Wang ,

  1. Would you show me the parameters when you ran GSEApy and GSEAv4.0 ? It shouldn't be such difference. One more thing is that GSEAv4.0 and GSEApy have difference algorithm to rank and filter genes.
    1. Yes, you can write png files. Just use --format png.
    2. GSEApy using Enrichr database as it's backend. So, .chip file is not supported. But, If you have .gmt file that annotated by gene id, it would be fine. GSEApy support custom .gmt files. Just makesure that .gmt file and your input gene id are the same type.
    3. You don't need to modify any file. Just run it. There's no limitation. But sometimes, it's not a good thing from software engineering side.

Hope it helps

Gin-Wang commented 4 years ago

Hi @Gin-Wang ,

  1. Would you show me the parameters when you ran GSEApy and GSEAv4.0 ? It shouldn't be such difference. One more thing is that GSEAv4.0 and GSEApy have difference algorithm to rank and filter genes.
  2. Yes, you can write png files. Just use --format png.
  3. GSEApy using Enrichr database as it's backend. So, .chip file is not supported. But, If you have .gmt file that annotated by gene id, it would be fine. GSEApy support custom .gmt files. Just makesure that .gmt file and your input gene id are the same type.
  4. You don't need to modify any file. Just run it. There's no limitation. But sometimes, it's not a good thing from software engineering side.

Hope it helps

Thanks for getting back to me!

I used to analysis my data by GSEA v4.0 using default parameters. I dont know if I had something wrong with it because my FDR value had always being a high level and NES were between -2 and 2. Here is the parameters. I think the the results using GSEApy may be the correct.

image

image

I still wonder that I use GSEApy to analysis my data and get a png picture with NES, pval and FDR, and how can i change it to ES, pval and FDR? Otherwise, if I want to focus on genes in specific pathway and the genes rank in this pathway, could GSEApy do this for me like GSEA destop version?

GENE SYMBOL GENE_TITLE RANK IN GENE LIST RANK METRIC SCORE RUNNING ES CORE ENRICHMENT
PEX14 peroxisomal biogenesis factor 14 [Source:HGNC Symbol;Acc:HGNC:8856] 10 0.456830412 0.021329671 Yes
RNF11 ring finger protein 11 [Source:HGNC Symbol;Acc:HGNC:10056] 14 0.45257917 0.04328438 Yes
ARL4A ADP ribosylation factor like GTPase 4A [Source:HGNC Symbol;Acc:HGNC:695] 28 0.414010525 0.06214531 Yes
SCP2 sterol carrier protein 2 [Source:HGNC Symbol;Acc:HGNC:10606] 33 0.407803684 0.08177333 Yes
RIOK3 RIO kinase 3 [Source:HGNC Symbol;Acc:HGNC:11451] 221 0.302392781 0.074385054 Yes
DLAT dihydrolipoamide S-acetyltransferase [Source:HGNC Symbol;Acc:HGNC:2896] 357 0.267014503 0.07145268 Yes
GRPEL1 GrpE like 1, mitochondrial [Source:HGNC Symbol;Acc:HGNC:19696] 360 0.266250014 0.08434048 Yes
PPARG peroxisome proliferator activated receptor gamma [Source:HGNC Symbol;Acc:HGNC:9236] 369 0.264106959 0.09640725 Yes
CAT catalase [Source:HGNC Symbol;Acc:HGNC:1516] 436 0.252355933 0.10097922 Yes

Best, Thank you

zqfang commented 4 years ago

What's your command for GSEApy? GSEApy use permutation type gene_set and ranking metric with log2_ratio_of_classes by default.

  1. for png question, just save your figure in pdf format, and then edit the figure. I think it's the fastest way to do that.

  2. in the ouput file, the column ledge_genes is what you want.

Gin-Wang commented 4 years ago

What's your command for GSEApy? GSEApy use permutation type gene_set and ranking metric with log2_ratio_of_classes by default.

  1. for png question, just save your figure in pdf format, and then edit the figure. I think it's the fastest way to do that.
  2. in the ouput file, the column ledge_genes is what you want.

Thanks for replying to me! My command is here:

gseapy gsea -p 8 -d P53.txt -c P53.cls -g h.all.v7.0.symbols.gmt -f 'png' -o ./GSEA_results/hallmark --graph 50 -m 'signal_to_noise'

I changed the parameters of "permutation type" in GSEAv4.0 as gene_set , it is still different with GSEApy but is better than phenotype. Thanks!

Gin-Wang commented 4 years ago

sorry about my numbers of questions.

There is another quetions in using GSEApy: I have a matrixs with 3 or more groups, can i compare them with each other such as AvsB and AvsC and BvsC in command line? Or i need to split them to 3 matrixs and calculate gsea them in 3 command line?

Best, Thank you!

zqfang commented 4 years ago

It's very easy to do this, if you know python. For command line, split them into 3 matrix.