snayfach / UHGV

Unified Human Gut Virome Catalog
https://portal.nersc.gov/UHGV
Other
27 stars 1 forks source link

how many sequences in the UHGV databae ? #66

Closed ChaoXianSen closed 9 months ago

ChaoXianSen commented 9 months ago

Hi Stephen, Thanks for your pipeline of human gut viruses classification. I was trying to use the uhgv-tools to classifiy viral contigs, but I have some questions about UHGV database:

077e8d825a3bea334c5073484c518a8

5dc61e16a813f0b959bde063298b79a

the number of sequence in 'genomes.fna' was the same as the species_level sequences. 'Where the number '208,643' come from ?'

look forward your reply!

snayfach commented 9 months ago

Thanks for your question. 57514 is the number of high-quality, species level representatives, as indicated in the figure. Feel free to reopen if you have further questions.

ChaoXianSen commented 7 months ago

Hi, I use checkV script aniclust.py to get specie_level votu: image

then, i use /MGV/aai_cluster/amino_acid_identity.py and filter_aai.py to get genus and family level votu: genus_level cluster: image family_level cluster: image

Q: why genus and family level votu name has been changed ? What is the relationship between species level votu name and genus(family) level votu name?