vpc-ccg / circminer

A sensitive and fast tool for circular RNA detection from RNA-Seq data
GNU General Public License v3.0
10 stars 4 forks source link

Seg Fault #4

Closed yhoogstrate closed 4 years ago

yhoogstrate commented 4 years ago

Dear authors of CircMiner,

I would like to thank you for writing this circRNA software package and to make it available to the scientific community under a FOSS license, and congratulations with your publication regarding the package.

I have attempted to run CircMiner but got stuck with a seg. fault. I am running a short read example (100 reads) to: GRCh38.p12.genome.fa + gencode.v29.annotation.gff3, both from gencode if I'm not mistaken.

I attempted the same with a fresh download of 'GRCh38.p13.genome.fa' and 'gencode.v34.annotation.gtf' and get similar seg faults.

I got the following stack trace:

(gdb) run  -r ~/bio/hg38/fasta/GRCh38.p12.genome.fa -g ~/bio/hg38/gxf/gencode.v29.annotation.gff3 -1 r1.fq -2 r2.fq -o test.out
Starting program: /home/youri/projects/hmf-circrna/circminer/circminer -r ~/bio/hg38/fasta/GRCh38.p12.genome.fa -g ~/bio/hg38/gxf/gencode.v29.annotation.gff3 -1 r1.fq -2 r2.fq -o test.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".      
Fri Jun 19 14:30:09 2020 [INFO] Number of threads: 1                                                                                               
Fri Jun 19 14:30:09 2020 [INFO] Input file type: Paired-end                                                                                                          
Fri Jun 19 14:30:09 2020 [INFO] Kmer size obtained from index: 20
Fri Jun 19 14:30:09 2020 [INFO] Loading GTF file...                                 
Fri Jun 19 14:30:23 2020 [INFO] Genome index type: Full
Fri Jun 19 14:30:23 2020 [INFO] Starting read extraction
Fri Jun 19 14:30:23 2020 [INFO] + Loading genome index...
Fri Jun 19 14:30:25 2020 [INFO] + Completed! (CPU time: 2.17s; Real time: 2.17s)
Fri Jun 19 14:30:25 2020 [INFO] + Loading genome sequence...
Fri Jun 19 14:30:26 2020 [INFO] + Completed! (CPU time: 0.92s; Real time: 0.92s)
Fri Jun 19 14:30:26 2020 [INFO] + Starting pseudo-alignment (Round 1)
[New Thread 0x7ffd32b38700 (LWP 10639)]
Fri Jun 19 14:30:26 2020 [INFO] + Completed round 1! (CPU time: 0.01s; Real time: 0.01s)
Fri Jun 19 14:30:26 2020 [INFO] + Loading genome index...
[Thread 0x7ffd32b38700 (LWP 10639) exited]
Fri Jun 19 14:30:28 2020 [INFO] + Completed! (CPU time: 1.87s; Real time: 1.87s)
Fri Jun 19 14:30:28 2020 [INFO] + Loading genome sequence...
Fri Jun 19 14:30:28 2020 [INFO] + Completed! (CPU time: 0.71s; Real time: 0.71s)
Fri Jun 19 14:30:28 2020 [INFO] + Starting pseudo-alignment (Round 2)
[New Thread 0x7ffd32b38700 (LWP 10640)]
Fri Jun 19 14:30:28 2020 [INFO] + Completed round 2! (CPU time: 0.01s; Real time: 0.01s)
Fri Jun 19 14:30:28 2020 [INFO] + Loading genome index...
[Thread 0x7ffd32b38700 (LWP 10640) exited]
Fri Jun 19 14:30:30 2020 [INFO] + Completed! (CPU time: 1.86s; Real time: 1.86s)
Fri Jun 19 14:30:30 2020 [INFO] + Loading genome sequence...
Fri Jun 19 14:30:31 2020 [INFO] + Completed! (CPU time: 0.78s; Real time: 0.78s)
Fri Jun 19 14:30:31 2020 [INFO] + Starting pseudo-alignment (Round 3)
[New Thread 0x7ffd32b38700 (LWP 10641)]
Fri Jun 19 14:30:31 2020 [INFO] + Completed round 3! (CPU time: 0.01s; Real time: 0.01s)
Fri Jun 19 14:30:31 2020 [INFO] + Loading genome index...
[Thread 0x7ffd32b38700 (LWP 10641) exited]
Fri Jun 19 14:30:31 2020 [INFO] + Completed! (CPU time: 0.33s; Real time: 0.33s)
Fri Jun 19 14:30:31 2020 [INFO] + Loading genome sequence...
Fri Jun 19 14:30:31 2020 [INFO] + Completed! (CPU time: 0.05s; Real time: 0.05s)
Fri Jun 19 14:30:31 2020 [INFO] + Starting pseudo-alignment (Round 4)
[New Thread 0x7ffd32b38700 (LWP 10642)]

Thread 5 "circminer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd32b38700 (LWP 10642)]
0x000055555556b1f6 in GTFParser::get_location_overlap(unsigned int, bool) ()
(gdb) backtrace
#0  0x000055555556b1f6 in GTFParser::get_location_overlap(unsigned int, bool) ()
#1  0x00005555555638c0 in FilterRead::process_mates(int, chain_list const&, Record const*, chain_list const&, Record const*, MatchedRead&, bool) ()
#2  0x0000555555564ba5 in FilterRead::process_read(int, Record*, Record*, int, GIMatchedKmer*, GIMatchedKmer*, chain_list&, chain_list&, chain_list&, chain_list&) ()
#3  0x000055555555afca in map_reads(void*) ()
#4  0x00007ffff7d6dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007ffff797d4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And with debug enabled:

[Switching to Thread 0x7ffd32b39700 (LWP 13873)]
FlatIntervalTree<UniqSeg>::find (this=0x555560d3ccc8, pos=86051865) at src/interval_tree_impl.h:144
144         if (pos < disjoint_intervals[0].spos)
(gdb) backtrace
#0  FlatIntervalTree<UniqSeg>::find (this=0x555560d3ccc8, pos=86051865) at src/interval_tree_impl.h:144
#1  0x00005555555735cd in GTFParser::get_location_overlap (this=0x5555555e1940 <gtf_parser>,
    loc=86051865, use_mask=false) at src/gene_annotation.cpp:524
#2  0x00005555555664bb in FilterRead::pair_chains (this=0x5555555e1ac0 <filter_read>,
    forward_chain=..., reverse_chain=..., mate_pairs=..., forward_paired=0x7ffd32b38a80,
    reverse_paired=0x7ffd32b38a60, saved_type=7) at src/filter.cpp:494
#3  0x0000555555564f69 in FilterRead::process_mates (this=0x5555555e1ac0 <filter_read>, thid=0,
    forward_chain=..., forward_rec=0x5555556198d8, backward_chain=..., backward_rec=0x5555559cade8,
    mr=..., r1_forward=true) at src/filter.cpp:251
#4  0x0000555555564c8f in FilterRead::process_read (this=0x5555555e1ac0 <filter_read>, thid=0,
    current_record1=0x5555556198d8, current_record2=0x5555559cade8, kmer_size=20, fl=0x555555d7ec80,
    bl=0x555555d7ee60, forward_best_chain_r1=..., backward_best_chain_r1=...,
    forward_best_chain_r2=..., backward_best_chain_r2=...) at src/filter.cpp:208
#5  0x000055555555a099 in map_reads (args=0x555555d7c7f0) at src/circminer.cpp:363
#6  0x00007ffff7d6dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#7  0x00007ffff797d4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Do you have an idea how I/you can resolve this issue or wether additional information is needed ?

kind regards,

Youri Hoogstrate

yenyilin commented 4 years ago

Hi Youri,

Thanks for your interests in circminer. This is due to the differences of attribute formatting between GENCODE and Emsembl. We are working on this part and will update it as soon as possible.

fhach commented 4 years ago

@yhoogstrate The recent commit should have resolved the issue. Please give it a try and close if it works successfully. We will release a new version on bioconda at a later date.

yhoogstrate commented 4 years ago

Great @yenyilin @fhach ,

I can confirm this patch works.

Any plans on adding CIGAR strings to SAM output so I can load the alignments in a genome browser?