zqfang / GSEApy

Gene Set Enrichment Analysis in Python
http://gseapy.rtfd.io/
BSD 3-Clause "New" or "Revised" License
544 stars 113 forks source link

Background gene is not saved in self._bg correctly and seems not be used #121

Closed hsiaoyi0504 closed 3 years ago

hsiaoyi0504 commented 3 years ago

ccrcc_test.txt

Setup

I am reporting a problem with GSEApy version, Python version, and operating system as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)

Result:

3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:10:52) 
[Clang 11.1.0 ]
CPython
macOS-11.2.3-arm64-arm-64bit
0.10.4

I am trying to use a background gene list, but seems doesn't work

import gseapy as gp
from numpy import isscalar
enr_up = gp.enrichr(upregulated_genes.index.to_list(), gene_sets='KEGG_2019_Human',
                 background = './data/ccrcc_test.txt',
                 organism='Human', # don't forget to set organism to the one you desired! e.g. Yeast
                 description='test_name',
                 outdir='enrichr_kegg', cutoff=0.5)
print(upregulated_genes.index.to_list())
print(enr_up.background)
print(isscalar(enr_up.background))
print(isinstance(enr_up.background, int) or enr_up.background.isdigit())
print(isinstance(enr_up.background, str))
print(len(enr_up.get_background()))
print(enr_up._bg)

Expected behaviour

The last enr_up._bg should not be None

Actual behaviour

enr_up._bg is None

['GYPA', 'MXRA5', 'MCAM', 'CDH9', 'LIPA', 'PLOD1', 'GZMH', 'RNASET2', 'ADAM9', 'LOX', 'FAS', 'PSAP', 'MAP3K7', 'FSTL1', 'SMPDL3A', 'MMRN1', 'NAAA', 'SELP', 'COL1A2', 'LAMA4', 'SCARB2', 'DIP2B', 'SATB2', 'CEMIP2', 'CSPG4', 'MPO', 'NASP', 'MSANTD4', 'CD74', 'ANGPT2', 'CDH13', 'CDH6', 'ADGRG6', 'ERAP2', 'PLXNC1', 'CD276', 'TLR8', 'ICAM1', 'THBS1', 'COL4A2', 'CSNK2A2', 'CD53', 'SPARCL1', 'HMCN1', 'EEA1', 'IGF1R', 'LOXL2', 'HP', 'HSPG2', 'FLT1', 'IL4I1', 'MSR1', 'ERO1A', 'HLCS', 'ST8SIA4', 'PLOD2', 'POGLUT3', 'FKBP10', 'ORM2', 'INSR', 'ITGB2', 'IKBIP', 'TMEM87B', 'CDH5', 'EGFR', 'CD180', 'FCGBP', 'SIRPA', 'ENPP2', 'ITGAL', 'APC', 'LTBP2', 'GPNMB', 'BOD1L1', 'LY75', 'HLA-DPB1', 'TGFB1', 'MERTK', 'SCN3A', 'GAA', 'SERTAD2', 'ATRNL1', 'CDCP1', 'MYO1F', 'ENTPD1', 'SYNE1', 'ENPP3', 'F8', 'POSTN', 'ADA2', 'SIGLEC1', 'PECAM1', 'KTN1', 'CRYBG3', 'FCGR1A', 'LEMD3', 'ITGA4', 'P2RX4', 'MAN2B1', 'LAMC1', 'CRTC3', 'ICAM3', 'ANO6', 'VWF', 'IGF2R', 'CD36', 'MPZL2', 'CSF1R', 'PLCB4', 'HERC1', 'MMRN2', 'ERCC5', 'VCAM1', 'PTPRC', 'ERAP1', 'DSCAM', 'COL1A1', 'PON2', 'NRP1', 'FGL2', 'PCDH18', 'HAPLN1', 'STAT5A', 'CP', 'CD163', 'ABCB1', 'THBS2', 'PRUNE2', 'ITGAM', 'TLR3', 'USP9X', 'CD40', 'TRAK1', 'LAMB1', 'SUPT16H', 'SERPINH1', 'TMEM87A', 'EMILIN2', 'ITGA5', 'COLGALT1', 'FBXL16', 'ITGAX', 'COL3A1', 'ESAM', 'TNFAIP6', 'SLC4A1', 'BCHE', 'VCAN', 'PLTP', 'CD68', 'ECPAS', 'ANGPTL2', 'PLOD3', 'A2M']
./data/ccrcc_test.txt
True
False
True
9323
None

Steps to reproduce

Download the attached ccrcc_test.txt and run following lines:

import gseapy as gp
from numpy import isscalar
upregulated_genes = ['GYPA', 'MXRA5', 'MCAM', 'CDH9', 'LIPA', 'PLOD1', 'GZMH', 'RNASET2', 'ADAM9', 'LOX', 'FAS', 'PSAP', 'MAP3K7', 'FSTL1', 'SMPDL3A', 'MMRN1', 'NAAA', 'SELP', 'COL1A2', 'LAMA4', 'SCARB2', 'DIP2B', 'SATB2', 'CEMIP2', 'CSPG4', 'MPO', 'NASP', 'MSANTD4', 'CD74', 'ANGPT2', 'CDH13', 'CDH6', 'ADGRG6', 'ERAP2', 'PLXNC1', 'CD276', 'TLR8', 'ICAM1', 'THBS1', 'COL4A2', 'CSNK2A2', 'CD53', 'SPARCL1', 'HMCN1', 'EEA1', 'IGF1R', 'LOXL2', 'HP', 'HSPG2', 'FLT1', 'IL4I1', 'MSR1', 'ERO1A', 'HLCS', 'ST8SIA4', 'PLOD2', 'POGLUT3', 'FKBP10', 'ORM2', 'INSR', 'ITGB2', 'IKBIP', 'TMEM87B', 'CDH5', 'EGFR', 'CD180', 'FCGBP', 'SIRPA', 'ENPP2', 'ITGAL', 'APC', 'LTBP2', 'GPNMB', 'BOD1L1', 'LY75', 'HLA-DPB1', 'TGFB1', 'MERTK', 'SCN3A', 'GAA', 'SERTAD2', 'ATRNL1', 'CDCP1', 'MYO1F', 'ENTPD1', 'SYNE1', 'ENPP3', 'F8', 'POSTN', 'ADA2', 'SIGLEC1', 'PECAM1', 'KTN1', 'CRYBG3', 'FCGR1A', 'LEMD3', 'ITGA4', 'P2RX4', 'MAN2B1', 'LAMC1', 'CRTC3', 'ICAM3', 'ANO6', 'VWF', 'IGF2R', 'CD36', 'MPZL2', 'CSF1R', 'PLCB4', 'HERC1', 'MMRN2', 'ERCC5', 'VCAM1', 'PTPRC', 'ERAP1', 'DSCAM', 'COL1A1', 'PON2', 'NRP1', 'FGL2', 'PCDH18', 'HAPLN1', 'STAT5A', 'CP', 'CD163', 'ABCB1', 'THBS2', 'PRUNE2', 'ITGAM', 'TLR3', 'USP9X', 'CD40', 'TRAK1', 'LAMB1', 'SUPT16H', 'SERPINH1', 'TMEM87A', 'EMILIN2', 'ITGA5', 'COLGALT1', 'FBXL16', 'ITGAX', 'COL3A1', 'ESAM', 'TNFAIP6', 'SLC4A1', 'BCHE', 'VCAN', 'PLTP', 'CD68', 'ECPAS', 'ANGPTL2', 'PLOD3', 'A2M']
enr_up = gp.enrichr(upregulated_genes, gene_sets='KEGG_2019_Human',
                 background = './ccrcc_test.txt',
                 organism='Human',
                 description='test_name',
                 outdir='enrichr_kegg', cutoff=0.5)
print(upregulated_genes.index.to_list())
print(enr_up.background)
print(isscalar(enr_up.background))
print(isinstance(enr_up.background, int) or enr_up.background.isdigit())
print(isinstance(enr_up.background, str))
print(len(enr_up.get_background()))
print(enr_up._bg)
hsiaoyi0504 commented 3 years ago

Or is the background gene only supported in local mode?

zqfang commented 3 years ago

Yes, the background gene is only supported in local mode. Enrichr sever used their own background, and you could not control it.

hsiaoyi0504 commented 3 years ago

Thanks for clarifying it!

hsiaoyi0504 commented 3 years ago

Just FYI, I recently found PDL1 and CD274 are sometimes interchangeable in the gene sets. That might be another reason causing this.