Open mjoppich opened 4 years ago
I'm looking into this now. Will let you know when I find the error.
@mjoppich I have tried running the same command you did: .graphmap2/bin/Linux-x64/graphmap2 align --rebuild-index -x rnaseq --threads 8 -r Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa -d SRR5989373_1.fastq -o SRR5989373_1.unmod.sam
and I didn't get segmentation fault.
This is the .sam file it generated: https://www.dropbox.com/s/q6vsdbukzfanrue/SRR5989373_1.unmod.sam.gz You should be able to download it.
Now, it could be something nondeterministic, I will try to see what could be wrong. If you have anything more useful for me let me know.
Thanks for your help!
I built graphmap2 with debug flags make -j4 debug
and run the whole thing in gdb.
/mnt/d/dev/git/graphmap2/bin/graphmap-debug align --rebuild-index -x rnaseq --threads 8 -r /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa --gtf /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.94.gtf -d fastq/SRR5989373_1.fastq -o graphmap2/SRR5989373_1.unmod.sam
[07:58:20 ProcessReads] [CPU time: 15.72 sec, RSS: 968 MB] Read: 202/241446 (0.08%) [m: 100, u: 95], length = 1990, qname: SRR5989373.203 88f29c93-2f23-49b6-8ae...
Thread 3 "graphmap-debug" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffffcad0700 (LWP 8620)]
0x000000000809b520 in std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:509
509 { return __ostream_insert(__out, &__c, 1); }
(gdb) where
#0 0x000000000809b520 in std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:509
#1 std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:515
#2 AlignmentToMD[abi:cxx11](std::vector<unsigned char, std::allocator<unsigned char> >&, signed char const*, long) (alignment=...,
ref_data=0x7ffffd2f0010 "ATGCTACGTATATACCACTCTCAACTTACCCTACTCTCACATTCCACTCCATGGCCCAGTCTCACTAAATCAGTACGATGCACTCACATCATTATTCACGGCACTTGCCTCAGCGGTTTATACCCTGTGCAATTTACCCATAAAACCCACGATTATCCACATTTTAATATCTATATCTCATTCAGCGGCTCCAAATATTG"...,
alignment_position_start=-119) at src/alignment/cigargen.cc:674
#3 0x00000000081125a7 in HackIntermediateMapping (mapping_data=mapping_data@entry=0x7fffe40b4d20, index=..., read=read@entry=0x9025bb0, parameters=parameters@entry=0x7ffffffed5e0, abs_ref_id=abs_ref_id@entry=1529, aln_result=...,
score=0x7ffffcacf1c8) at src/graphmap/process_read.cc:802
#4 0x0000000008119293 in GraphMap::RNAGenerateAlignments_ (this=this@entry=0x7ffffffed430, order_number=order_number@entry=187, mapping_data=mapping_data@entry=0x7fffe40b4d20, index=..., transcriptome=..., read=read@entry=0x9025bb0,
parameters=0x7ffffffed5e0, evalue_params=0x8a95450, realignment_structures=0x7ffffffecec0) at src/graphmap/process_read.cc:1047
#5 0x000000000811c185 in GraphMap::ProcessRead (this=this@entry=0x7ffffffed430, order_number=order_number@entry=187, mapping_data=mapping_data@entry=0x7fffe40b4d20, read=0x9025bb0, parameters=parameters@entry=0x7ffffffed5e0,
evalue_params=evalue_params@entry=0x8a95450, realignment_structures=<optimized out>) at src/graphmap/process_read.cc:252
#6 0x0000000008136f07 in GraphMap::ProcessSequenceFileInParallel (this=<optimized out>, parameters=<optimized out>, reads=<optimized out>, last_time=<optimized out>, fp_out=<optimized out>, ret_num_mapped=<optimized out>,
ret_num_unmapped=0x0) at src/graphmap/graphmap.cc:1333
#7 0x00007fffff1e695e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#8 0x00007ffffe3e76db in start_thread (arg=0x7ffffcad0700) at pthread_create.c:463
#9 0x00007ffffe93188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) frame 2
#2 AlignmentToMD[abi:cxx11](std::vector<unsigned char, std::allocator<unsigned char> >&, signed char const*, long) (alignment=...,
ref_data=0x7ffffd2f0010 "ATGCTACGTATATACCACTCTCAACTTACCCTACTCTCACATTCCACTCCATGGCCCAGTCTCACTAAATCAGTACGATGCACTCACATCATTATTCACGGCACTTGCCTCAGCGGTTTATACCCTGTGCAATTTACCCATAAAACCCACGATTATCCACATTTTAATATCTATATCTCATTCAGCGGCTCCAAATATTG"...,
alignment_position_start=-119) at src/alignment/cigargen.cc:674
674 md << ref_data[ref_position + j];
(gdb) list
669 }
670 }
671 } else if (cigar_array[i].op == 'D' || cigar_array[i].op == 'N') {
672 md << '^';
673 for (int32_t j=0; j<cigar_array[i].count; j++) {
674 md << ref_data[ref_position + j];
675 }
676 }
677
678 if ((i + 1) < cigar_array.size() && cigar_array[i].op != '=' && cigar_array[i+1].op != '=') {
gcc --version
gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
Does that help you?
Other question: which branch are you on? I am using the master branch (or whatever else is the default branch).
I am using master branch, the default one. Thank you for the help, I can see that, when generating MD string in the alignment generation, the program tries to access the negative index in the array.
Someone already reported similar issue, but on different place and it happened because his gtf file was not correct.
Now, I also ran the program in gcd and I don't get segmentation fault, but some undefined behaviour obviously is happening.
I will look into this MS string generation and I will make sure that it doesn't break. But, can you show me the stack trace you get in gdb when running without gtf file? Also are you sure that the gtf file is correct? I tried to evaluate the produced .sam file with https://github.com/lbcb-sci/RNAseqEval with this command: "python ../../RNAseqEval-newesrt0/RNAseqEval/RNAseqEval.py eval-mapping ../../yeastBug/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa ../../yeastBug/SRR5989373_1.unmod.sam -a ../../yeastBug/Saccharomyces_cerevisiae.R64-1-1.94.gtf --no_check_strand"
and i get the error: ERROR: Duplicate chromosome name: chrchromosome
The GTF file is downloaded from ensembl. I would guess they know how to make a GTF file ...
The stacktrace for running without the gtf file:
[17:55:30 ProcessReads] [CPU time: 2478.91 sec, RSS: 5167 MB] Read: 76839/241446 (31.82%) [m: 75709, u: 1123], length = 943, qname: SRR5989373.76840 36d1f278-b0...
Thread 5 "graphmap-debug" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffffa9d0700 (LWP 11235)]
0x000000000809b520 in std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:509
509 { return __ostream_insert(__out, &__c, 1); }
(gdb) where
#0 0x000000000809b520 in std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:509
#1 std::operator<< <std::char_traits<char> > (__c=<optimized out>, __out=...) at /usr/include/c++/7/ostream:515
#2 AlignmentToMD[abi:cxx11](std::vector<unsigned char, std::allocator<unsigned char> >&, signed char const*, long) (alignment=...,
ref_data=0x7ffffc210010 "CACCACACCCACACACCACACCCACACACACACCACACCCACACACCACACCCACACACCACACCCACTACTCTAACCCTATTCTAATCCAACCCTGATCAACCTGTCTCCAAACCTACCCTCACATTACCCTACCTCTCCACTCGTTACCCTGCCCCACTCAACCATACCACTCCCACCCACCATCCATCTCTCTACTG"...,
alignment_position_start=-21396) at src/alignment/cigargen.cc:674
#3 0x00000000081125a7 in HackIntermediateMapping (mapping_data=mapping_data@entry=0x7ffe3112b9a0, index=..., read=read@entry=0x13326be0, parameters=parameters@entry=0x7ffffffed640, abs_ref_id=abs_ref_id@entry=31, aln_result=...,
score=0x7ffffa9cf1c8) at src/graphmap/process_read.cc:802
#4 0x0000000008119293 in GraphMap::RNAGenerateAlignments_ (this=this@entry=0x7ffffffed490, order_number=order_number@entry=76802, mapping_data=mapping_data@entry=0x7ffe3112b9a0, index=..., transcriptome=...,
read=read@entry=0x13326be0, parameters=0x7ffffffed640, evalue_params=0x86a1600, realignment_structures=0x7ffffffecf20) at src/graphmap/process_read.cc:1047
#5 0x000000000811c185 in GraphMap::ProcessRead (this=this@entry=0x7ffffffed490, order_number=order_number@entry=76802, mapping_data=mapping_data@entry=0x7ffe3112b9a0, read=0x13326be0, parameters=parameters@entry=0x7ffffffed640,
evalue_params=evalue_params@entry=0x86a1600, realignment_structures=<optimized out>) at src/graphmap/process_read.cc:252
#6 0x0000000008136f07 in GraphMap::ProcessSequenceFileInParallel (this=<optimized out>, parameters=<optimized out>, reads=<optimized out>, last_time=<optimized out>, fp_out=<optimized out>, ret_num_mapped=<optimized out>,
ret_num_unmapped=0x0) at src/graphmap/graphmap.cc:1333
#7 0x00007fffff1e695e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#8 0x00007ffffe3e76db in start_thread (arg=0x7ffffa9d0700) at pthread_create.c:463
#9 0x00007ffffe93188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Not having a GTF file and still getting the error in this very same location sounds like a more general problem instead of a GTF problem ;)
The outfile remains empty btw ;)
The generation of transcriptome from GTF file is not working. I have to fix that. But if you run the program without --gtf and with --rebuild-index (to overwrite the index generated with gtf file) it should work, ti does for me.
Hi,
as mentioned in my last post:
The stacktrace for running without the gtf file:
And indeed, I always run with --rebuild-index.
The following call results in the above segfault (the one from 17:55:30):
/mnt/d/dev/git/graphmap2/bin/graphmap-debug align --rebuild-index -x rnaseq --threads 8 -r /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa -d fastq/SRR5989373_1.fastq -o graphmap2/SRR5989373_1.unmod.sam
Actually my feeling is that the problem lies in the -x rnaseq
setting - do you also run with this setting?
Hi,
yes I do, I run with -x rnaseq.
I understand everything you are saying, but I don't know why you are getting segmentation fault when running without gtf and with --rebuild-index. Because, as I said, i don't get the segmentation fault in that case, both in debug and release.
I used the reference and the dataset you provided in your first post mode and I the program finishes with the results I posted above.
PS: It is true that you don't have to (or even shoudn't) run the program with -x rnaseq if you use gtf file because the generated index will not contain introns and then there is no need to use -x rnaseq.
Hi, I am running graphmap2 on the following data:
Reads: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR598/003/SRR5989373/SRR5989373_1.fastq.gz
Reference GTF: ftp://ftp.ensembl.org/pub/release-94/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.94.gtf.gz
Reference: ftp://ftp.ensembl.org/pub/release-94/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
Using v0.6.3 (git pull and built today) I still get Segmentation Faults:
I can also call graphmap2 in a different fashion:
/mnt/d/dev/git/graphmap2/bin/Linux-x64/graphmap2 align --rebuild-index -x rnaseq --gtf /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.94.gtf --threads 8 -r /home/mjoppich/dev/data/genomes/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.gm2.fa -d fastq/SRR5989373_1.fastq -o graphmap2/SRR5989373_1.unmod.sam
Then the segmentation fault occurs earlier:
I'd be more than happy if you could look into this issue. Given the few introns, the yeast genome might be useful for debugging :)