Closed jodyphelan closed 1 year ago
Yes, it does support search reads to detect the most similar strain against many reference genomes. Please follow the recently added tutorial: https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens/ .
You may need to remove duplicated sequences first and choose a limited number of representative genomes for each strain. Because, for thousands of reference genomes of small genomes like HBV and dengue viruses, KMCP might fail to detect the target species.
Some tips:
-n 1
if no targets are detected.Anyway, just have a try, and let me know if you have any issues.
Thanks, I'll give it a try. Great tool btw!
Amazing that works like a charm!
Glad it helps!
I just remembered I forgot to paste the latest unreleased version, which fixes a bug in chunk computation when splitting circular genomes (--circular
).
https://github.com/shenwei356/kmcp/files/11702458/kmcp_linux_amd64.tar.gz
I've got some dengue NGS data and I'm wondering if I can use kmcp to find the best matching reference. I've built a database from ~4800 reference sequnces using the following:
Doing an assembly and then using
kmcp search contig.fa -d kmcpdb/
works quite nicely and identifies the same reference sequence I get when I use blast.However sometimes it is difficult to do a genome assembly and in this case I would like to search directly using the reads. Is this possible? I've tried with
kmcp search -d kmcpdb reads.fq.gz --query-whole-file -o result.tsv.gz
, however this does not seem to return any hits. Any guidance would be appreciated.