tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
761 stars 211 forks source link

Goatools not accepting Human Phenotype Ontology (HPO) obo file and annotation. #202

Closed ecgenomics closed 2 years ago

ecgenomics commented 3 years ago

Good morning. I am trying to use goatools to perform a gene list enrichment based on HPO ontology (). I am providing an OBO file, with the same format as the one from Gene Ontology, and an annotation that consists of a tab file with the gene symbol in the first column and a list of semi-colon separated HPO terms in the second column.

For simplicity, I share with you guys a folder with the command-line (command.sh) and the files I am working with: https://drive.google.com/drive/folders/1PzSvW7Gu7ScYbx24h9WEAwuxFgYShgFz?usp=sharing

The error that I get is a standard python KeyError:

Traceback (most recent call last): File "/Users/fabio/.pyenv/versions/3.7.3/bin/find_enrichment.py", line 44, in <module> main() File "/Users/fabio/.pyenv/versions/3.7.3/bin/find_enrichment.py", line 31, in main obj = GoeaCliFnc(GoeaCliArgs().args) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/cli/find_enrichment.py", line 231, in __init__ self.objanno = self._get_objanno(self.args.filenames[2]) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/cli/find_enrichment.py", line 254, in _get_objanno return get_objanno(assoc_fn, anno_type, **kws) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/factory.py", line 29, in get_objanno return IdToGosReader(fin_anno, **kws_id2go) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/idtogos_reader.py", line 20, in __init__ namespaces=kws.get('namespaces')) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/annoreader_base.py", line 47, in __init__ self.associations = self._init_associations(filename, **kws) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/idtogos_reader.py", line 49, in _init_associations ini = InitAssc(fin_anno, kws['godag'], kws['namespaces']) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/init/reader_idtogos.py", line 22, in __init__ self.nts = self.init_associations(namespaces) File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/init/reader_idtogos.py", line 35, in init_associations nts = self._init_w_godag() if self.godag else self._init_dflt() File "/Users/fabio/.pyenv/versions/3.7.3/lib/python3.7/site-packages/goatools/anno/init/reader_idtogos.py", line 54, in _init_w_godag nspc = NAMESPACE2NS[goobj.namespace] if goobj else '' KeyError: ''

Any idea on how could I get goatools to work with this kind of ontology?

Thank you in advance, Fabio.

dvklopfenstein commented 3 years ago

Thank you for your interest in GOA TOOLs and for taking the time to write us with this very interesting request.

I have downloaded the files in your example and have made minor modifications to GOA TOOLS which enable it to read the Human Phenotype Ontology file, but am unable to proceed to run the full example, because the background list of genes, gobackground.list, is missing in the file list.

Can you upload the file, gobackground.list?

ecgenomics commented 3 years ago

Whoops! Sorry, my bad. I have just uploaded the background.

Thanks for your answer.

dvklopfenstein commented 3 years ago

Your example is working in my local environment.

The entire GOA TOOLs test suite is running using the changed files. Once this test suite passes, I can commit and push the changes to the main branch and Haiboa Tang can create a new release.

These are the results for your example. Please check to see that this looks like what you expect:

$ command.sh
hp.obo: fmt(1.2) rel(hp/2021-02-28) 19,498 Terms
HMS:0:00:00.795489 187,934 annotations READ: hpo.annotation.tab
Study: 2252 vs. Population 14446

Load human_phenotype Gene Ontology Analysis ...
Propagating term counts up: is_a
 26%  3,823 of 14,446 population items found in association

Run human_phenotype Gene Ontology Analysis: current study set of 2252 IDs ... 29%    650 of  2,252 study items found in association
100%  2,252 of  2,252 study items found in population(14446)
Calculating 9,331 uncorrected p-values using fisher
   9,331 GO terms are associated with  3,823 of 14,446 population items
   5,802 GO terms are associated with    650 of  2,252 study items
  METHOD bonferroni:
       2 GO terms found significant (< 0.05=alpha) (  2 enriched +   0 purified): local bonferroni
     489 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)
  METHOD sidak:
       2 GO terms found significant (< 0.05=alpha) (  2 enriched +   0 purified): local sidak
     489 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)
  METHOD holm:
       2 GO terms found significant (< 0.05=alpha) (  2 enriched +   0 purified): local holm
     489 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)
  METHOD fdr_bh:
      12 GO terms found significant (< 0.05=alpha) ( 12 enriched +   0 purified): statsmodels fdr_bh
     498 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)
    162 of 9,331 results have uncorrected P-values <= 0.01=pval

    162 items WROTE: enrichment.hpo

And here is the output file:

enrichment.hpo.txt

ecgenomics commented 3 years ago

Thank you so much, the output is just as I expected. Looking forward to the new release.

dvklopfenstein commented 3 years ago

Terrific. That is good news. I am still working on the test suite, plus creating a new test for testing enrichment using Human Phenotype Ontologies and a new Jupyter notebook, and I want to ensure plotting works with these ontologies.

I have a deadline of next Friday for a different project, so the GOA TOOLs work will need to be finished after next Friday.

Thanks for the terrific question and the full working example. This will surely be popular functionality and is well worth the time to implement.

dvklopfenstein commented 3 years ago

My other project will take another week, so I will back on this (and other open GOA TOOLs issues) hopefully within the week.

As one of the followers of https://github.com/obophenotype/human-phenotype-ontology, I am extremely excited about your issue. The functionality that you have requested could be extremely interesting in my own and others research.

Thank you so much for this great request.

ecgenomics commented 3 years ago

Thank you for your enthusiastic help and sorry for the very late response. In my rather limited experience, most studies rely on enrichment analysis to "make sense" of the gene lists they return. Sometimes, if we base this analysis on GO we might obtain descriptors that are less meaningful than the ones that we would obtain from more specific databases, such as HPO. So, support for these minor databases is always very welcome.

dvklopfenstein commented 2 years ago

Hello @ecgenomics,

Thank you so much for the fantastic request to support enrichment analyses for ontologies besides gene ontology.

I have added the functionality which supports enrichment analyses for annotations of gene-to-human_phenotype_ontologies.

See this Jupyter notebook for an example usage: https://github.com/tanghaibao/goatools/blob/main/notebooks/Enrichment_analyses_human_phenotype_ontology.ipynb

Please feel free to express your thoughts after trying the new functionality.

It took a while to commit because I was in the middle of my PhD defense and the test regressions revealed errors which needed to be examined. Many were due to errors in annotation files which are often caught by the GOA TOOLs test suite, but then need to be reported back to gene ontology. I wanted to be sure all test results were carefully examined.

Please open a new issue if anything strikes you.