shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
176 stars 13 forks source link

kmcp search crashed #10

Closed zhanxw closed 1 year ago

zhanxw commented 2 years ago

I used the latest kmcp (downloaded from GitHub) and GTDB database (downloaded from WeTransfer) to align FASTQ files. The command line was kmcp search --load-whole-db --threads 32 --db-dir /home2/xzhan9/data/reference/kmcp/gtdb.kmcp -1 data/SRR12397805_1.fastq.gz -2 data/SRR12397805_2.fastq.gz --out-file kmcp/SRR12397805.out --log kmcp/SRR12397805.log kmcp crashed on two machines (both have >128G memory).

The input files, SRR12397805_1.fastq.gz and SRR12397805_2.fastq.gz, were downloaded from NCBI SRA.

The error messages were: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7b1256].

This crash bug happened randomly, as sometimes kmcp search can work perfectly fine.

More relevant outputs:

$ seqkit stats data/SRR12397805_1.fastq.gz data/SRR12397805_2.fastq.gz
file                         format  type   num_seqs      sum_len  min_len  avg_len  max_len
data/SRR12397805_1.fastq.gz  FASTQ   DNA   1,676,891  247,187,869       15    147.4      151
data/SRR12397805_2.fastq.gz  FASTQ   DNA   1,676,891  247,022,404       15    147.3      151
$ kmcp version
kmcp v0.8.2
$ kmcp search --load-whole-db --threads 32 --db-dir /home2/xzhan9/data/reference/kmcp/gtdb.kmcp -1 data/SRR12397805_1.fastq.gz -2 data/SRR12397805_2.fastq.gz --out-file kmcp/SRR12397805.out --log kmcp/SRR12397805.log
metaphlan-report kmcp/SRR12397805.out --cami-report kmcp/SRR12397805.cami.out --sample-id SRR12397805 --binning-result kmcp/SRR12397805.bin22:50:44.811 [INFO] kmcp v0.8.2
22:50:44.813 [INFO]   https://github.com/shenwei356/kmcp
22:50:44.813 [INFO]
22:50:44.813 [INFO] checking input files ...
22:50:44.865 [INFO] loading database into main memory ...

22:50:46.105 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block001.uniki
22:50:54.871 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block002.uniki
22:50:56.256 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block004.uniki
22:50:56.544 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block003.uniki
22:50:59.652 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block005.uniki
22:51:00.957 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block027.uniki
22:51:01.091 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block007.uniki
22:51:01.169 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block006.uniki
22:51:01.292 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block008.uniki
22:51:03.181 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block009.uniki
22:51:03.652 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block011.uniki
22:51:04.320 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block010.uniki
22:51:04.769 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block013.uniki
22:51:05.238 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block012.uniki
22:51:06.211 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block014.uniki
22:51:06.398 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block015.uniki
22:51:07.185 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block016.uniki
22:51:07.262 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block018.uniki
22:51:07.546 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block019.uniki
22:51:07.582 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block017.uniki
22:51:08.221 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block020.uniki
22:51:08.590 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block026.uniki
22:51:08.699 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block021.uniki
22:51:09.063 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block030.uniki
22:51:09.755 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block031.uniki
22:51:10.131 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block022.uniki
22:51:12.791 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block024.uniki
22:51:13.106 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block025.uniki
22:51:14.194 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block023.uniki
22:51:15.207 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block028.uniki
22:51:15.384 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block032.uniki
22:51:16.456 [INFO]   loaded index file: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp/R001/_block029.uniki
22:51:16.457 [INFO] database loaded: /home2/xzhan9/data/reference/kmcp/gtdb.kmcp
22:51:16.476 [INFO]
22:51:16.476 [INFO] -------------------- [main parameters] --------------------
22:51:16.476 [INFO]   minimum    query length: 30
22:51:16.476 [INFO]   minimum  matched k-mers: 10
22:51:16.476 [INFO]   minimum  query coverage: 0.550000
22:51:16.476 [INFO]   minimum target coverage: 0.000000
22:51:16.497 [INFO]   minimum target coverage: 0.000000
22:51:16.497 [INFO] -------------------- [main parameters] --------------------
22:51:16.497 [INFO]
22:51:16.497 [INFO] searching ...
22:51:16.513 [INFO] reading from paired-end files: data/SRR12397805_1.fastq.gz, data/SRR12397805_2.fastq.gz
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7b1256]

goroutine 112426 [running]:
github.com/shenwei356/kmcp/kmcp/cmd.NewUnikIndexDB.func3.1(0xce8bd57400)
    /home/shenwei/shenwei/scripts/go/src/github.com/shenwei356/kmcp/kmcp/cmd/util-db-search.go:783 +0x276
created by github.com/shenwei356/kmcp/kmcp/cmd.NewUnikIndexDB.func3
    /home/shenwei/shenwei/scripts/go/src/github.com/shenwei356/kmcp/kmcp/cmd/util-db-search.go:1020 +0x2d7
shenwei356 commented 2 years ago

Firstly, please check file integrity with md5sum after downloading the file. This should be the cause.

md5sum -c gtdb.kmcp.tar.gz.md5.txt

Then, for the search command you paste:

kmcp search --load-whole-db --threads 32 \
    --db-dir /home2/xzhan9/data/reference/kmcp/gtdb.kmcp \
    -1 data/SRR12397805_1.fastq.gz \
    -2 data/SRR12397805_2.fastq.gz \
    --out-file kmcp/SRR12397805.out \
    --log kmcp/SRR12397805.log
  1. Single-end mode is recommended for paired-end reads, for higher sensitivity.
  2. I'd recommend add an extension of .gz to the output file, so it would save a lot of space.

    -out-file kmcp/SRR12397805.out .gz

kmcp search -h:

-o, --out-file string            ► Out file, supports and recommends a ".gz" suffix ("-" for
                                 stdout). (default "-")
shenwei356 commented 2 years ago

I see, it is indeed a bug that occurs when searching using paired-end reads with one read shorter than the value of -m/--min-query-len (30 by default).

zhanxw commented 2 years ago

Thank you. I will test and report back here.

zhanxw commented 2 years ago

kmcp search now works perfectly. I also take your advice to reduce output file sizes. Thank you for developing kmcp.

shenwei356 commented 2 years ago

Let's make it open, will close it after the release of the next stable version.