snayfach / UHGV

Unified Human Gut Virome Catalog
https://portal.nersc.gov/UHGV
Other
27 stars 1 forks source link

Inconsistent host prediction within viral clusters #56

Closed snayfach closed 1 year ago

snayfach commented 1 year ago

vSUBFAM-00042

2 problems here.. 1) the consensus host is wrong (poor logic in sub-selecting genomes?) 2) individual host predictions are wrong

Example: UHGV-1388670 and UHGV-0355051

UHGV-1388670

UHGV-0355051

SUPER CONSISTENT PATTERN:

Also, v. low quality fragments more likely associated with Firmicutes (likely bc shorter contig more likely to share 25% of region due to chance)

What are the solutions??? 1) be smarter about choosing CRISPR or kmer... if you're confident the virus is virulent, DONT USE KMERS 2) be smarter about computing the lineage consensus... use only mq+ or hq+ genomes

Do you see similar issues for temperate phages?