wtsi-hpag / Scaff10X

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
MIT License
20 stars 3 forks source link

segmentation fault scaff_bwa.c #25

Open GTG1988A opened 2 years ago

GTG1988A commented 2 years ago

Dear developers,

we are running Scaff10x V.5 with the following command:

scaff10x -nodes 30 -longread 1 -gap 100 -matrix 2000 -reads 10 -link 8 -score 20 -edge 50000  -block 50000 -data input.dat contigs-break.fasta output_scaffolds.fasta

My input.dat is:

q1=/mypath/to/fastq/NA24143_barcoded.part_001.fastq.gz
q2=/mypath/to/fastq/NA24143_barcoded.part_002.fastq.gz

and we obtain the following error message:

[M::mem_pestat] mean and std.dev: (272.64, 66.53)
[M::mem_pestat] low and high boundaries for proper pairs: (7, 539)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 28 reads in 0.045 CPU sec, 0.018 real sec
[main] Version: 0.7.17-r1198-dirty
[main] CMD: /mypath/Scaff10X/src/scaff-bin/bwa mem -p -t 30 tarseq.fastq -
[main] Real time: 2.431 sec; CPU: 2.265 sec
sh : ligne 1 : 11840 segmentation fault /mypath/Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out
Error running command: /mypath/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out
Input target assembly file2: /mypath/contigs-break.fasta
www: /mypath/input.dat input.dat
Input data file: /mypath/input.dat
/mypath/Scaff10X/src/scaff-bin/scaff_FilePreProcess -t 2 -n 1 /mypath/input.dat - |/mypath/Scaff10X/src/scaff-bin/bwa mem -p -t 30 tarseq.fastq -  | egrep tarseq_ | awk '($2<100)&&($5>=0){print $1,$2,$3,$4,$5}' | egrep -v '^@' > align.dat
/mypath/scaff10x/Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

I tried to launch it several times. We did not get any out of memory error. Instead of input.dat I tried to put directly the fastq. Here, I run it on the result of break10x but I also tried on another assembly.

So we tried to find where the error is by modifying the src/makefile by adding the following flag :

-ggdb -fsanitize=address  -fno-omit-frame-pointer -static-libstdc++ -static-libgcc -static-libasan -lrt

like this:

# Makefile for scaff10x
CC= gcc
CFLAGS= -O2 -std=c11 -march=x86-64 -mtune=generic -ggdb -fsanitize=address -fno-omit-frame-pointer -static-libstdc++ -static-libgcc -static-libasan -lrt
LFLAGS= -lm -pthread -lz

we run the command that causes the problem in tmp_rununik repertory :

Scaff10X/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat

and we obtain:

AddressSanitizer:DEADLYSIGNAL
=41390==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f8c98704721 bp 0x7ffd69f98740 sp 0x7ffd69f97ed8 T0)
==41390==The signal is caused by a READ memory access.
==41390==Hint: address points to the zero page.
#0 0x7f8c98704721 in __strlen_sse2_pminub (/lib64/libc.so.6+0x16f721)
#1 0x433f33 in __interceptor_strcpy ../../.././libsanitizer/asan/asan_interceptors.cpp:437
#2 0x406e52 in main /mypath/src/scaff_bwa.c:220
#3 0x7f8c985b7554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x407006  (/mypath/Scaff10X/src/scaff-bin/scaff_bwa+0x407006)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib64/libc.so.6+0x16f721) in __strlen_sse2_pminub

Do you have an idea for the resolution of this problem? It seems that your code (scaff_bwa.c, line 220) is expecting a _ in align.dat, but there is none.

Thank you!