shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
176 stars 13 forks source link

kmcp search is stucked #43

Closed aababc1 closed 4 months ago

aababc1 commented 4 months ago

Hi. Thank you for your nice tool.

I am currently searching metagenome against viral genome database that you provide in the link : https://bioinf.shenwei.me/kmcp/database/

I installed this tool using conda. But I don't know why the search step takes too long. Host OS is centos 7.10 ( 3.10.0-1160.el7.x86_64 )

image

Thank you in advance.

shenwei356 commented 4 months ago

Try push the "enter" and see if another log shows up. If it does, it's running. If your input file are huge, it would take a long time.

aababc1 commented 4 months ago

Thank you for your quick reply. I'm sorry there was an error in my expression . It's not stopped, but I think the speed is way slower compared to your paper's benchmark results or other user.

The input data is about 5Gbps , illumina short paired end reads, server has 2TB memory and 2 * Xeon 8000 series 40cpus .

I tried compiled version and conda version also with/without -w option to run KMCP .

Used database is genbank-viral that is one of KMCP premade database.

07:05:50.199 [INFO] processed queries: 30374219, speed: 0.159 million queries per minute 07:05:50.199 [INFO] 1.6593% (504003/30374219) queries matched 07:05:50.199 [INFO] done searching 07:05:50.199 [INFO] search results saved to: test2 07:05:50.199 [INFO] 07:05:50.199 [INFO] elapsed time: 3h11m18.24440856s

In my guess, KMCP could analyze faster than kraken2, but in some reason it's speed is slow.

I would like to get help, how I can boost speed.

downloaded version : KMCP v0.9.4

command I ran :

../kmcp/kmcp/kmcp search -d genbank-viral.kmcp -1 ERR1018185_fp_hg38_1.fastq.gz -2 ERR1018185_fp_hg38_2.fastq.gz -o test2 -j 20 -w 03:54:31.955 [INFO] kmcp v0.9.4 03:54:31.955 [INFO] https://github.com/shenwei356/kmcp

shenwei356 commented 4 months ago

The best way is increase the value of -j.

KMCP is far slower than Kraken, as shown in the paper. :(

aababc1 commented 4 months ago

Ah sorry for the misunderstanding. strong points is accuracy. I was confused while reading other issues related with program operation and speed. Thank you so much.