sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
466 stars 79 forks source link

using taxonkit to generate sourmash lineages #1851

Open ctb opened 2 years ago

ctb commented 2 years ago

per @mr-eyes -

I was trying yesterday to generate lineage information and follow the format as in all_genbank_lineages.20200727.csv and I started using taxonkit reformat for that purpose but still WIP. It seems like a nice tool to standardize the expected tax levels. https://bioinf.shenwei.me/taxonkit/usage/#reformat

This command gives the exact format of the sourmash lineages file

echo $tax_id | taxonkit lineage | awk '$2!=""' | taxonkit reformat --format "{k},{p},{c},{o},{f},{g},{s},{t}" | cut -f 3
shenwei356 commented 2 years ago

For version > 0.8.0, reformat accepts input of TaxIds via the flag -I/--taxid-field.

echo $tax_id | taxonkit reformat --taxid-field 1 --format "{k},{p},{c},{o},{f},{g},{s},{t}" | cut -f 2
mr-eyes commented 2 years ago

@shenwei356 Thanks!