obophenotype / ncbitaxon

Build for NCBITaxon
BSD 3-Clause "New" or "Revised" License
25 stars 7 forks source link

Build a subset that excludes all taxa below species rank #51

Closed jamesaoverton closed 2 years ago

jamesaoverton commented 3 years ago

Over the years I've written (and rewritten) code for LJI that starts with the NCBI Taxonomy and builds a customized organism tree to drive a hierarchical search interface on http://iedb.org. We're starting to refactor that code again, and considering pushing some of that functionality into this repository.

One thing that might be generally useful is a subset of the NCBI Taxonomy without any of the subspecies, strains, and other taxa below the rank of species. Since they make up the bulk of the taxa, the subset would be much smaller and easier to work with than the full taxonomy. (This isn't quite as straightforward as it sounds, since many taxa are not assigned a rank, and we've found cases where there are species that are children of other species.)

Am I right that a species subset would be generally useful to the community?

nleguillarme commented 3 years ago

Well, I work with the NCBI Taxonomy (mainly because it is the only taxonomy available as an ontology), and for my use case (inferring trophic groups from trophic interactions), I guess everything below the rank of species is not that useful.

Anyway, I agree with you it would be easier to work with some subsets of the NCBI Taxonomy, maybe one subset per superkingdom...

jamesaoverton commented 3 years ago

It turns out that I was wrong, and "lower ranks" make up a small portion of the taxonomy:

https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=statistics&uncultured=hide&unspecified=hide

So there wouldn't be as much of an advantage to a species subset as I thought.

cmungall commented 2 years ago

closing, but feel free to reopen if you want to revisit