hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
144 stars 26 forks source link

Memory problem of f5c v1.4 #163

Closed JeremyQuo closed 5 months ago

JeremyQuo commented 5 months ago

When I run f5c cmds as below,

f5c eventalign -r wt2.fastq -g GCF_000005845.2_ASM584v2_genomic.fna --slow5 wt1.blow5 --pore rna004 -b wt2.bam -c --min-mapq 0 -t 64 --rna -o wt2.paf

My server return the problem

*** Error in `f5c': free(): invalid size: 0x00007f0960110ee0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7340f)[0x7f0a0470340f]
/lib64/libc.so.6(+0x78c7e)[0x7f0a04708c7e]
/lib64/libc.so.6(+0x79957)[0x7f0a04709957]
f5c(_Z5alignP11AlignedPairPci11event_tableP7model_tj10scalings_tf+0xb5e)[0x56048b9db17e]
f5c(_Z12align_singleP6core_tP4db_ti+0x14f)[0x56048b9c370f]
f5c(_Z14process_singleP6core_tP4db_ti+0x22)[0x56048b9c5d12]
f5c(_Z14pthread_singlePv+0x29)[0x56048b9c3539]
/lib64/libpthread.so.0(+0x81a3)[0x7f0a04f291a3]
/lib64/libc.so.6(clone+0x6d)[0x7f0a04776fad]
======= Memory map: ========
56048b9a9000-56048b9bf000 r--p 00000000 fd:02 75237256711                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/bin/f5c
56048b9bf000-56048ba63000 r-xp 00016000 fd:02 75237256711                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/bin/f5c
56048ba63000-56048ba8a000 r--p 000ba000 fd:02 75237256711                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/bin/f5c
56048ba8a000-56048ba8c000 r--p 000e0000 fd:02 75237256711                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/bin/f5c
56048ba8c000-56048cd9e000 rw-p 000e2000 fd:02 75237256711                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/bin/f5c
7f09ffc54000-7f0a00c55000 rw-p 00000000 00:00 0 
7f0a00c55000-7f0a01c56000 rw-p 00000000 00:00 0 
7f0a03059000-7f0a0345a000 rw-p 00000000 00:00 0 
7f0a0351d000-7f0a03845000 rw-p 00000000 00:00 0 
7f0a03845000-7f0a03859000 r-xp 00000000 fd:00 42175693                   /usr/lib64/libresolv-2.18.so
7f0a03859000-7f0a03a58000 ---p 00014000 fd:00 42175693                   /usr/lib64/libresolv-2.18.so
7f0a03a58000-7f0a03a59000 r--p 00013000 fd:00 42175693                   /usr/lib64/libresolv-2.18.so
7f0a03a59000-7f0a03a5a000 rw-p 00014000 fd:00 42175693                   /usr/lib64/libresolv-2.18.so
7f0a03a5a000-7f0a03a5c000 rw-p 00000000 00:00 0 
7f0a03a5c000-7f0a03a60000 r--p 00000000 fd:02 47396382235                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5support.so.0.1
7f0a03a60000-7f0a03a66000 r-xp 00004000 fd:02 47396382235                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5support.so.0.1
7f0a03a66000-7f0a03a68000 r--p 0000a000 fd:02 47396382235                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5support.so.0.1
7f0a03a68000-7f0a03a69000 r--p 0000c000 fd:02 47396382235                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5support.so.0.1
7f0a03a69000-7f0a03a6a000 rw-p 0000d000 fd:02 47396382235                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5support.so.0.1
7f0a03a6a000-7f0a03a8e000 r--p 00000000 fd:02 64640556556                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5.so.3.3
7f0a03a8e000-7f0a03ae8000 r-xp 00024000 fd:02 64640556556                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5.so.3.3
7f0a03ae8000-7f0a03b30000 r--p 0007e000 fd:02 64640556556                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5.so.3.3
7f0a03b30000-7f0a03b3e000 r--p 000c5000 fd:02 64640556556                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5.so.3.3
7f0a03b3e000-7f0a03b40000 rw-p 000d3000 fd:02 64640556556                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libkrb5.so.3.3
7f0a03b40000-7f0a03b4e000 r--p 00000000 fd:02 71293338118                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libzstd.so.1.5.5
7f0a03b4e000-7f0a03c3d000 r-xp 0000e000 fd:02 71293338118                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libzstd.so.1.5.5
7f0a03c3d000-7f0a03c52000 r--p 000fd000 fd:02 71293338118                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libzstd.so.1.5.5
7f0a03c52000-7f0a03c53000 r--p 00112000 fd:02 71293338118                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libzstd.so.1.5.5
7f0a03c53000-7f0a03c54000 rw-p 00113000 fd:02 71293338118                /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libzstd.so.1.5.5
7f0a03c54000-7f0a03c79000 r--p 00000000 fd:02 2215837222                 /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libssl.so.3
7f0a03c79000-7f0a03d05000 r-xp 00025000 fd:02 2215837222                 /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libssl.so.3
7f0a03d05000-7f0a03d34000 r--p 000b1000 fd:02 2215837222                 /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libssl.so.3
7f0a03d34000-7f0a03d3f000 r--p 000e0000 fd:02 2215837222                 /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libssl.so.3
7f0a03d3f000-7f0a03d43000 rw-p 000eb000 fd:02 2215837222                 /home/lilab/Apps/anaconda3/envs/nanoCEM_env/lib/libssl.so.3
7f0a03d43000-7f0a03d45000 r-xp 00000000 fd:00 47570281                   /usr/lib64/libdl-2.18.so
7f0a03d45000-7f0a03f45000 ---p 00002000 fd:00 47570281                   /usr/lib64/libdl-2.18.so

And the log from f5c

[meth_main::623.763*32.53] 512 Entries (0.2M bases) loaded
[sig_handler::ERROR] I regret to inform that a segmentation fault occurred. But at least it is better than a wrong answer.

It seems there is a memory problem. I re-installed f5c and changed to another server, but I still can not avoid the problem. Is there any solution?

hasindu2008 commented 5 months ago

Hello

This seems to be a bug. Are you seeing this on multiple input files or is it on this particular set of files? Would you be able to share the inputs files (are they very large?) so that I can try to reproduce the bug on my side?

Thanks.

JeremyQuo commented 5 months ago

I am using one input blow5 file. The size of blow5 is about more than 120Gb, which is hard to upload. But I am trying to find the bug read

JeremyQuo commented 5 months ago

But what's interesting is that when I separately extract the read that I think has a problem (after the last line of PAF) and add it as a new FASTQ file and BAM file, it can be successfully processed by eventalign. I am still checking it

hasindu2008 commented 5 months ago

You can use samtools view file.bam chr:1-1000 like syntax to extract the region of the genome for which the problem may have occured. Then you can try running on that small BAM file and see if the error can be reproduced. If you locate the region in the BAM file that causes the error, then you can use samtools faidx and slow5tools get to extract the necessary reads for this region.

JeremyQuo commented 5 months ago

Thanks for your suggestion. Now I subsample the problem reads(include blow5/fastq/bam files) And it will fail when I run the cmd

f5c eventalign -r subset.fastq -g GCF_000005845.2_ASM584v2_genomic.fna --slow5 subset.blow5 --pore rna004 -b subset.bam -c --min-mapq 0 -t 64 --rna -o subset.paf

But it (120Mb) is larger than the file size of github(25Mb).

Here is the dropbox link: https://www.dropbox.com/scl/fi/lmm52pgyjurma2h9115vv/subset.tar.gz?rlkey=lafiyapczhm4u30sunms4b20v&st=uuf0s6kx&dl=0

And it should be supposed to watch and download.

hasindu2008 commented 5 months ago

Thank you very much. I could reproduce the issue. It seems to be the following read which seems to be very weird. Do you think it is a real read? 69107f00-0265-4b7c-9f57-35e8112eb17d.fastq.txt

Neverthless, I fixed the potential bug in the latest dev branch. See if it works for you now?

JeremyQuo commented 5 months ago

So, is it the issue of ultra-long reads? And will f5c work after I remove these reads?

hasindu2008 commented 5 months ago

NO, it is not an issue with ultra-long reads. It is a very weird repetitive read that messed up my assumption of the maximum memory for the banded alignment. This is the first case I saw out of billions of reads processed over the years. Now I have updated to handle these cases, (hopefully).

JeremyQuo commented 5 months ago

OK. I will try it, many thanks.

JeremyQuo commented 5 months ago

Thanks for your response. Now the dev branch works!

hasindu2008 commented 5 months ago

Thats for finding this rare bug 👍