Closed shenwei356 closed 12 months ago
Sample data:
Creating a KMCP database:
# split reference genomes into 10 chunks with 150-bp overlaps kmcp compute -k 21 -n 10 -l 150 -I refs/ -O refs-n10-l150 # index with a small FPR for small genomes kmcp index -f 0.001 -I refs-n10-l150/ -O refs.kmcp
Searching reads against the KMCP database:
kmcp search -d refs.kmcp/ testdata.fq.gz -o testdata.fq.gz.kmcp.tsv.gz 23:19:42.530 [INFO] processed queries: 676694, speed: 32.606 million queries per minute 23:19:42.530 [INFO] 8.0837% (54702/676694) queries matched
Profiling:
# --level strain is used when no taxonomy is given. # some preset profiling modes are available. kmcp profile --level strain testdata.fq.gz.kmcp.tsv.gz \ | tee profile.tsv csvtk cut -t -f ref,percentage,coverage,score,chunksFrac,reads profile.tsv \ | csvtk pretty -t ref percentage coverage score chunksFrac reads ----------- ---------- ---------- ------ ---------- ----- NC_045512.2 100.000000 275.461793 100.00 1.00 54702
coverage is the vertical coverage or depth, score is a similarity score, and chunksFrac is the horizontal coverage of the genome.
coverage
score
chunksFrac
Added: https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens/
KMCP v0.9.3 or later versions is needed, which fixed a bug in chunk computation when splitting circular genomes.
Sample data:
Creating a KMCP database:
Searching reads against the KMCP database:
Profiling:
coverage
is the vertical coverage or depth,score
is a similarity score, andchunksFrac
is the horizontal coverage of the genome.