tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
781 stars 210 forks source link

Error: Only few genes/proteins in the study are found in the background population. #130

Open arumds opened 5 years ago

arumds commented 5 years ago

I have been using GOA tools the latest zipped version, and getting the below error that only few input genes are in the background population. Is there a way to get through this error or is it that it doesn’t work with the input gene count.

$ /goatools-master/scripts/find_enrichment.py --obo ../go-basic.obo --pval=0.05 --indent --method fdr_bh,bonferroni --outfile GO_Genes.tsv,GO_Genes.xlsx RDA.Genes ../EnsemblKnownProteinCodingbackgroundID.txt ../SlimEnsemblGenes_GOAssocationID.txt

../go-basic.obo: fmt(1.2) rel(2019-05-09) 47,407 GO Terms ARGS GoeaCliFnc Namespace(alpha=0.05, annofmt=None, compare=False, ev_exc=None, ev_help=True, ev_help_short=True, ev_inc=None, filenames=['RDA.Genes', '../EnsemblKnownProteinCodingbackgroundID.txt', '../SlimEnsemblGenes_GOAssocationID.txt'], goslim='goslim_generic.obo', id2sym=None, indent=True, method='fdr_bh,bonferroni', min_overlap=0.7, no_propagate_counts=False, ns='BP,MF,CC', obo='../go-basic.obo', outfile='GO_Genes.tsv,GO_Genes.xlsx', outfile_detail=None, pval=0.05, pval_field=None, pvalcalc='fisher', ratio=None, sections=None, taxid=9606) HMS:0:00:00.408382 93,186 annotations READ: ../SlimEnsemblGenes_GOAssocationID.txt Study: 295 vs. Population 20197

WARNING: only 0.586440677966 fraction of genes/proteins in study are found in the population  background.

ERROR: only 0.586440677966 of genes/proteins in the study are found in the background population. Please check.
mayupsc commented 5 years ago

I have encountered with the same error, i suppose it's the unique genes in study file cause this error. But I don't know how to make it work

dvklopfenstein commented 5 years ago

@mehar-GIT and @mayupsc,

To further investigate with what you are seeing, can you provide these items?

  1. A log of all messages to the screen during your run
  2. A copy of the top 20 lines of the population file
  3. A copy of the top 20 lines of the study file
  4. A copy of the top 20 lines of the annotation file

With this information, we can proceed further...

Thank your taking the time to contact us and for your interest in GOATOOLS.

dvklopfenstein commented 5 years ago

Hopefully, you are up and running and so did not need to provide the additional information necessary to help solve you issue. In the hopes that this is the case, I close this issue now.

Please open an new issue if you need us to take a look. Thank you for taking the time to write us and for your interest in GOATOOLS.

barrantesisrael commented 3 years ago

I'm seeing the same error with the sample data (downloaded from https://github.com/tanghaibao/goatools/tree/main/tests/data):

$ find_enrichment.py small_study small_population small_association --outfile=1.xlsx --pval=0.05 --method=fdr_bh --pval_field=fdr_bh
go-basic.obo: fmt(1.2) rel(2021-05-01) 47,284 GO Terms
HMS:0:00:00.069865   6,309 annotations READ: small_association 
Study: 38 vs. Population 2000

WARNING: only 0.39473684210526316 fraction of genes/proteins in study are found in the population background.

ERROR: only 0.39473684210526316 of genes/proteins in the study are found in the background population. Please check.
akaur1988 commented 2 years ago

I am seeing the same error.

find_enrichment.py --alpha=$alpha --pval=$p_val --indent --obo $obo $study $population $association > $output

WARNING: only 0.018268156424581006 fraction of genes/proteins in study are found in the population background.

ERROR: only 0.018268156424581006 of genes/proteins in the study are found in the background population. Please check.

HelloWorldLTY commented 2 years ago

Hi, did anyone solve this problem? Thanks a lot.