shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
180 stars 13 forks source link

reporting proportion of unmatched reads #20

Closed peterbjarke closed 2 years ago

peterbjarke commented 2 years ago

Many thanks for this open source tool! Vey well documented

When running the profiling: For example: kmcp profile search.kmcp@gtdb.kmcp.tsv.gz --taxid-map taxid.map --taxdump taxdump/ --out-prefix search.tsv.gz.k.profile --metaphlan-report search.tsv.gz.m.profile --cami-report search.tsv.gz.c.profile --binning-result search.tsv.gz.binning.gz

It is possible to report the proportion of unmatched read similar to what kraken2 does ?

Best regards,

Peter

shenwei356 commented 2 years ago

Hi Peter, thanks for you using KMCP.

kmcp search reports the proportion of matched reads.

10:01:29.112 [INFO] processed queries: 595188, speed: 2.135 million queries per minute
10:01:29.112 [INFO] 100.0000% (595188/595188) queries matched
10:01:29.112 [INFO] done searching

And kmcp search shows the number and proportion of reads belonging to targets in the profile.

10:04:30.308 [INFO] #input matched reads: 595189, #reads belonging to references in profile: 595189, proportion: 100.000000%

Maybe kmcp search could write the basic summary to the search result file as comment lines, then kmcp search shall read and report them again.

peterbjarke commented 2 years ago

Many thanks for the quick answer! Yes I can see the reporting in the stderr output.

Best regards,

Peter

shenwei356 commented 2 years ago

Maybe kmcp search could write the basic summary to the search result file as comment lines, then kmcp search shall read and report them again.

I may not do this, cause it's difficult for kmcp profile to read the comment lines.

shenwei356 commented 1 year ago

I'll implement this. #33