2 problems here..
1) the consensus host is wrong (poor logic in sub-selecting genomes?)
2) individual host predictions are wrong
Example: UHGV-1388670 and UHGV-0355051
UHGV-1388670
vOTU-028230
vGENUS-00140
50/52 connections to B uniformis
method = CRISPR
UHGV-0355051
vOTU-103341
vGENUS-00140
20/20 connections to Firmicutes
method = PHIST
SUPER CONSISTENT PATTERN:
CRISPR = B uniformis
PHIST = Firmicutes
Also, v. low quality fragments more likely associated with Firmicutes (likely bc shorter contig more likely to share 25% of region due to chance)
What are the solutions???
1) be smarter about choosing CRISPR or kmer... if you're confident the virus is virulent, DONT USE KMERS
2) be smarter about computing the lineage consensus... use only mq+ or hq+ genomes
vSUBFAM-00042
2 problems here.. 1) the consensus host is wrong (poor logic in sub-selecting genomes?) 2) individual host predictions are wrong
Example: UHGV-1388670 and UHGV-0355051
UHGV-1388670
UHGV-0355051
SUPER CONSISTENT PATTERN:
Also, v. low quality fragments more likely associated with Firmicutes (likely bc shorter contig more likely to share 25% of region due to chance)
What are the solutions??? 1) be smarter about choosing CRISPR or kmer... if you're confident the virus is virulent, DONT USE KMERS 2) be smarter about computing the lineage consensus... use only mq+ or hq+ genomes
Do you see similar issues for temperate phages?