Closed jamesaoverton closed 2 years ago
Well, I work with the NCBI Taxonomy (mainly because it is the only taxonomy available as an ontology), and for my use case (inferring trophic groups from trophic interactions), I guess everything below the rank of species is not that useful.
Anyway, I agree with you it would be easier to work with some subsets of the NCBI Taxonomy, maybe one subset per superkingdom...
It turns out that I was wrong, and "lower ranks" make up a small portion of the taxonomy:
So there wouldn't be as much of an advantage to a species subset as I thought.
closing, but feel free to reopen if you want to revisit
Over the years I've written (and rewritten) code for LJI that starts with the NCBI Taxonomy and builds a customized organism tree to drive a hierarchical search interface on http://iedb.org. We're starting to refactor that code again, and considering pushing some of that functionality into this repository.
One thing that might be generally useful is a subset of the NCBI Taxonomy without any of the subspecies, strains, and other taxa below the rank of species. Since they make up the bulk of the taxa, the subset would be much smaller and easier to work with than the full taxonomy. (This isn't quite as straightforward as it sounds, since many taxa are not assigned a rank, and we've found cases where there are species that are children of other species.)
Am I right that a species subset would be generally useful to the community?