refresh-bio / Whisper

GNU General Public License v3.0
24 stars 4 forks source link

Segmentfault when aligning 25X e.coli single-end reads #16

Open i-xiaohu opened 4 years ago

i-xiaohu commented 4 years ago

Hi, whisper developers. I run the command whisper ref/ref data1.fastq, and whisper (released version 2.0.1) results in

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.39412s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 43.4175s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

The ref is a common e.coli reference sequence, and the data1.fq is 593M, see the first reads down below.

@SRR1562082.1 HWI-ST1336:80:C3CJUACXX:1:1101:2018:2193/1
ATCGCATCCGGGCAGTAGTATTTTGCTTTTTTCAGAAAATAATCAAAAAAAGTTAGCGTGGTGAATCGATACTTTACCGGTTGAATTTGCATCAATTTCAT
+
@B@FFFFDFHHHHJJGFHHFHGGJHIJIJJJJIJJJJJGIIIJJJJJJJJJFEEHHFFFDDAB@CC@BBBABCDECDCBBBBBDCADDDDEEDDDDECCEE

Whisper finally gives an empty SAM file.

Thanks! i-xiaohu

agudys commented 4 years ago

Hello,

I'll take a look on that ASAP.

Regards, Adam

quito418 commented 3 years ago

Hello,

I am also experiencing the same issue.

I followed the guide in the Quick start and met Segmentation Fault.

my commands:

src/whisper-index human ~/human_ref/human_g1k_v37.fasta ./index ./temp/
src/whisper -r -out mappings ./index/human ~/ERR3239276.fq

Error log:

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.44478s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 201.566s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

and when I used GDB I get the below result.

(gdb) bt
#0  0x0000000000486f85 in CSamGenerator::store_mapped_read(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsig
ned int, unsigned int, unsigned int, unsigned int, unsigned char*&) ()
#1  0x0000000000489190 in CSamGenerator::process_group_se() ()
#2  0x000000000048f828 in CSamGenerator::operator()() ()
#3  0x000000000054da14 in execute_native_thread_routine ()
#4  0x000000000041fb19 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x0000000000615ab3 in clone ()

Thank you!

agudys commented 2 years ago

Hello,

Sorry it took me so long. I was able to reproduce the error. I'll let you know once it's fixed (this time, I promise to do this sooner ;)).

Adam

agudys commented 2 years ago

@quito418 @i-xiaohu I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

quito418 commented 2 years ago

@quito418 @i-xiaohu I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

Thank you for your time.

I will let you know if I have a problem.

Best Regards,

quito418 commented 2 years ago

@agudys Hi,

Thank you, I checked it runs well without segfault after the fix.

I just want to make sure everything is working fine.

In particular, I am currently running Whisper with 48 threads for the human genome using 800M 101bp short reads.

./src/whisper -rs -out mappings -t 48 -temp ./temp/ ./index/human /ssd/ERR194147_1.fastq.gz

The post-processing stage takes really long (currently running for like 2 hours) compared to the preceding 2 steps (Preprocessing 735 sec, Read mapping 844 sec).

So I wonder if it is supposed to be like that or if there is a recommendation for the number of threads.

image

I would appreciate any advice.

Best Regards,

agudys commented 2 years ago

@quito418 I must admit that postprocessing time look strange. In our experiments on 32 cores, approximately 3 hours were needed to perform full paired-end mappings of ~100GB human reads in gz. Maybe there is still something wrong with the single-end mode... Is 48 the physical or logical number of cores at your machine? In the latter case, you could try to reduce number of threads to 24.

Adam

quito418 commented 2 years ago

@agudys Thanks for the information.

I was using 24 physical cores and 48 logical cores for the experiment.

I will reduce the number of threads for my experiment and update the result here!

Best Regards,