Closed drob2727 closed 2 years ago
Hi @drob2727,
I don't have a clear answer for that. Several molecular mechanisms may result in the retention of identical sequences in the genomes of the phage and its host. Such sequences can correspond to prophages (several thousand common k-mers), horizontally-transferred genes (hundreds of common k-mers), and short sequences such as CRISPR spacers and integration sites (1-10 common k-mers). If you care about a high recall of hosts' predictions, one k-mer will probably be the best choice.
Is there a minimum number of common k-mers to have a high quality host prediction or is 1 accurate?