muellan / metacache

memory efficient, fast & precise taxnomomic classification system for metagenomic read mapping
GNU General Public License v3.0
57 stars 12 forks source link

Form and variety #11

Closed donovan-h-parks closed 4 years ago

donovan-h-parks commented 4 years ago

Hi. Quick question. What do the form and variety attributes for -abundance-per mean? I've always taken variety to be a synonym of subspecies and am not clear what form means in terms of taxonomy. Thanks.

muellan commented 4 years ago

Hi. Yeah, as always with this bio stuff it's kind of a mess. The taxonomic ranks "form" and "varietas"/"variety" are both defined in the NCBI taxonomy, yet how they are applied to sequences is sometimes not very consistent. The relationship is form -> variety -> subspecies -> species -> subgenus -> genus -> ... . In our experiments with bacterial samples and eucariotic genomes in foodstuff we didn't look at levels below species, and according to my experience "subspecies" is the lowest level that makes any sense right now.

Below is a list of all taxonomic rank names that are recognized by MetaCache (lowest rank first) and to what rank they are mapped internally:

NCBI name(s) MetaCache rank
sequence sequence
genome sequence
form form
forma form
variety variety
varietas variety
subspecies subspecies
species species
species group subgenus
species subgroup subgenus
subgenus subgenus
genus genus
subtribe subtribe
tribe tribe
subfamily subfamily
family family
superfamily suborder
parvorder suborder
infraorder suborder
suborder suborder
order order
superorder subclass
infraclass subclass
subclass subclass
class class
superclass subphylum
subphylum subphylum
phylum phylum
division phylum
superphylum subkingdom
subkingdom subkingdom
kingdom kingdom
subdomain kingdom
superkingdom domain
domain domain
root root
donovan-h-parks commented 4 years ago

Great - thanks!