ravel-lab / VALENCIA

VAginaL community state typE Nearest CentroId clAssifier
MIT License
14 stars 6 forks source link

BVAB1 species level assignment #5

Open kviljoen opened 2 years ago

kviljoen commented 2 years ago

Hi there,

I'm wondering about the assignment of BVAB(1/2/3) as you say that VALENCIA only considers species level annotation for Lactobacillus, Gardnerella, Prevotella, Atopobium and Sneathia, but you also say that "CST IV-A contains a high to moderate relative abundance of BVAB1 and G. vaginalis" and "CST IV-C contains low relative abundances of G. vaginalis, BVAB1, and Lactobacillus spp. and includes". Please can you clarify?

Regards, Katie.

michaelfrance commented 2 years ago

Hi Katie,

Apologies BVAB1 is a species level assignment, it just didn't occur to me because it's name hasn't been finalized. If you look at the column headings of the reference centroids file, you can see exactly what is expected.

Hope this helps,

Michael

kviljoen commented 2 years ago

Thanks for the prompt response Michael.

I see you have BVAB1, BVAB2.BVAB3, BVAB_TM7 and Shuttleworthia_satelles which would require manual curation in external datasets such as ours. Do you have reference seqs or accession numbers listed somewhere that you used for these?

Thanks again, Katie.

kviljoen commented 2 years ago

Hi Michael,

We do have BVAB1 annotated in our dataset under the Species column, but the convert_qiime.py script doesn't appear to be setup to pick up any BVABs. The resulting merge file does not contain BVAB, which affects the CST assignments by Valencia.py?

Regards, Katie.

Gscorreia89 commented 1 year ago

Hi,

I have a somewhat similar situation: more specifically, I am trying to run VALENCIA with taxonomic assignments generated by SILVA. I am also keen to find out which sequences were used to assign BVAB's and Shuttleworthia. Are these sequences from STIRRUPS DB?

Best Wishes, Goncalo

luhugerth commented 1 year ago

In the same vein - it's impossible from the data here to know which species belong to e.g. "Ruminoccocus_2" vs "Ruminoccocus_3". Just letting us know the exact procedure used for annotating the samples used for the centroids would help with a lot of troubleshooting...