shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
53 stars 12 forks source link

taxonomy improvement #67

Open mihinduk opened 2 years ago

mihinduk commented 2 years ago

Hi Mike,

In working with the SIV data, I realized that there are spaces in taxonomic fields. For example, for families: Verrucomicrobia subdivision 3 Verrucomicrobia subdivision 6

This will make pulling reads by family more difficult. Could all spaces in taxonomy fields be replaced with underscores in the next update?

Thank you, Kathie

beardymcjohnface commented 2 years ago

It's a tab-separated file so spaces shouldn't be a problem. How were you planning on parsing the files?

mihinduk commented 2 years ago

I was trying to make a helper script to pull reads from a family of interest, so a shell script but could do it differently.

beardymcjohnface commented 2 years ago

If you're using awk, you'll just need to pass -F '\t' to change the field separator to tabs instead of whitespace.

mihinduk commented 2 years ago

Hi Mike, [Uploading 2021_05_18_Viral_Baltimore_full_classification_table_ICTV2020.txt…]()

I just created an updated taxonomy database with Baltimore classification, which I am having trouble uploading here. This is the latest ICTV release. I will email it to you and Rob.