refresh-bio / PHIST

Phage-Host Interaction Search Tool
GNU General Public License v3.0
27 stars 2 forks source link

Common K-Mers #6

Closed drob2727 closed 2 years ago

drob2727 commented 2 years ago

Is there a minimum number of common k-mers to have a high quality host prediction or is 1 accurate?

aziele commented 2 years ago

Hi @drob2727,

I don't have a clear answer for that. Several molecular mechanisms may result in the retention of identical sequences in the genomes of the phage and its host. Such sequences can correspond to prophages (several thousand common k-mers), horizontally-transferred genes (hundreds of common k-mers), and short sequences such as CRISPR spacers and integration sites (1-10 common k-mers). If you care about a high recall of hosts' predictions, one k-mer will probably be the best choice.