sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
448 stars 78 forks source link

rankinfo for protein databases #1026

Open ctb opened 4 years ago

ctb commented 4 years ago

done with sourmash lca rankinfo

DNA database at k=31 scaled=10,000

superkingdom: 9422 (0.1%) phylum: 3423 (0.0%) class: 15675 (0.2%) order: 9235 (0.1%) family: 43975 (0.6%) genus: 372073 (4.7%) species: 7424343 (94.2%)

dayhoff database at k=57 (true k at 19), scaled=1,000

superkingdom: 362465 (2.9%) phylum: 54233 (0.4%) class: 180252 (1.4%) order: 105923 (0.8%) family: 363480 (2.9%) genus: 1440426 (11.5%) species: 10051513 (80.0%)

ref #637

ctb commented 4 years ago

protein database at k=57 (true k=19), scaled=1000

superkingdom: 47248 (0.3%) phylum: 13340 (0.1%) class: 67809 (0.4%) order: 53529 (0.3%) family: 257499 (1.4%) genus: 1681728 (9.4%) species: 15721861 (88.1%) strain: 0 (0.0%)

ctb commented 4 years ago

fascinating difference at superkingdom level! suggests contamination readily detectable by dayhoff.