rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.
MIT License
302 stars 77 forks source link

Segfault if read.aux() is called with a nonexistent tag #3

Closed benwbooth closed 9 years ago

benwbooth commented 9 years ago

I'm writing a tool to convert bam files to bedgraph/bigwig format using your library (See here), but I'm running into segfaults. The code I'm writing is checking each record for an XS tag to try and determine strandedness. This tag is completely optional, some BAM files won't have it. It seems like rust-htslib is currently segfaulting if you call read.aux() on a tag that isn't present in the record. Here is the output from rust-gdb:

(gdb) run  --bigwig --autostrand ~/analysis/dmel-all-r6.04.bdgp.autostrand.bam --out /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001 /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam
Starting program: /users/bbooth/src/bam2bedgraph/target/debug/bam2bedgraph --bigwig --autostrand ~/analysis/dmel-all-r6.04.bdgp.autostrand.bam --out /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001 /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Running strand detection phase on /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam

Program received signal SIGSEGV, Segmentation fault.
0x0000555555603958 in rust_htslib::bam::record::Record::aux (self=0x7fffffffc650, tag=&[u8](len: 2) = {...})
    at /users/bbooth/.multirust/toolchains/nightly/cargo/git/checkouts/rust-htslib-e74b3f238d5ee614/master/src/bam/record.rs:242
242             match *aux {
(gdb) bt
#0  0x0000555555603958 in rust_htslib::bam::record::Record::aux (self=0x7fffffffc650, tag=&[u8](len: 2) = {...})
    at /users/bbooth/.multirust/toolchains/nightly/cargo/git/checkouts/rust-htslib-e74b3f238d5ee614/master/src/bam/record.rs:242
#1  0x0000555555580397 in bam2bedgraph::analyze_bam (options=0x7fffffffdcb8, split_strand="uu", autostrand_pass=true, intervals=0x7fffffffd4e0)
    at src/main.rs:194
#2  0x00005555555b3251 in bam2bedgraph::main () at src/main.rs:488
#3  0x00005555556ecdc9 in rust_try_inner ()
#4  0x00005555556ecdb6 in rust_try ()
#5  0x00005555556e9a69 in rt::lang_start::h9c8d2f74c4ef585di3w ()
#6  0x00005555555c1a0c in main ()

I checked the source code listed in the stack trace:

   /// Get auxiliary data (tags).
   pub fn aux(&self, tag: &[u8]) -> Option<Aux> {
       let aux = unsafe { htslib::bam_aux_get(&self.inner, ffi::CString::new(tag).unwrap().as_ptr() as *mut i8 ) };

       unsafe {
           match *aux {
               b'c'|b'C'|b's'|b'S'|b'i'|b'I' => Some(Aux::Integer(htslib::bam_aux2i(aux))),
               b'f'|b'd' => Some(Aux::Float(htslib::bam_aux2f(aux))),
               b'A' => Some(Aux::Char(htslib::bam_aux2A(aux) as u8)),
               b'Z'|b'H' => {
                   let f = aux.offset(1) as *const i8;
                   let x = ffi::CStr::from_ptr(f).to_bytes();
                   Some(Aux::String(mem::copy_lifetime(self, x)))
               },
               _ => None,
           }
       }
   }

The segfault is happening on the match *aux line. I'm not an expert in rust yet, but it looks like it's dereferencing the aux variable without checking if it's set to null (zero). I think bam_aux_get returns zero if there was no matching tag. Would it be possible to check for a null value returned, then return None from the aux method before dereferencing the aux variable? I think that would fix the segfaults. Thanks!

christopher-schroeder commented 9 years ago

Hi, thanks for your effort to debug and name the problem in detail. We fixed the bug the way you suggested it and returning None if the aux tag is not available. The problem is now fixed in version 0.4.1

Additionally all dependencies were updated to its newest Version - if this is important to anyone.

Best Christo

johanneskoester commented 9 years ago

Hi Ben,thank you very much for reporting and trying rust-htslib. I think I have fixed the bug in the master branch. Can you confirm?Thanks,JohannesOn Jul 2, 2015, at 7:36 PM, Ben Booth notifications@github.com wrote:I'm writing a tool to convert bam files to bedgraph/bigwig format using your library (See here), but I'm running into segfaults. The code I'm writing is checking each record for an XS tag to try and determine strandedness. This tag is completely optional, some BAM files won't have it. It seems like rust-htslib is currently segfaulting if you call read.aux() on a tag that isn't present in the record. Here is the output from rust-gdb:

(gdb) run --bigwig --autostrand ~/analysis/dmel-all-r6.04.bdgp.autostrand.bam --out /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001 /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam Starting program: /users/bbooth/src/bam2bedgraph/target/debug/bam2bedgraph --bigwig --autostrand ~/analysis/dmel-all-r6.04.bdgp.autostrand.bam --out /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001 /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Running strand detection phase on /data/analysis/nauRNAi/test/BS-2-nauRNAi-16-18-1_S12_L001_R1_001.bam

Program received signal SIGSEGV, Segmentation fault. 0x0000555555603958 in rust_htslib::bam::record::Record::aux (self=0x7fffffffc650, tag=&[u8](len: 2) = {...}) at /users/bbooth/.multirust/toolchains/nightly/cargo/git/checkouts/rust-htslib-e74b3f238d5ee614/master/src/bam/record.rs:242 242 match *aux { (gdb) bt

0 0x0000555555603958 in rust_htslib::bam::record::Record::aux (self=0x7fffffffc650, tag=&[u8](len: 2) = {...})

at /users/bbooth/.multirust/toolchains/nightly/cargo/git/checkouts/rust-htslib-e74b3f238d5ee614/master/src/bam/record.rs:242

1 0x0000555555580397 in bam2bedgraph::analyze_bam (options=0x7fffffffdcb8, split_strand="uu", autostrand_pass=true, intervals=0x7fffffffd4e0)

at src/main.rs:194

2 0x00005555555b3251 in bam2bedgraph::main () at src/main.rs:488

3 0x00005555556ecdc9 in rust_try_inner ()

4 0x00005555556ecdb6 in rust_try ()

5 0x00005555556e9a69 in rt::lang_start::h9c8d2f74c4ef585di3w ()

6 0x00005555555c1a0c in main ()

I checked the source code listed in the stack trace:

/// Get auxiliary data (tags). pub fn aux(&self, tag: &[u8]) -> Option { let aux = unsafe { htslib::bam_aux_get(&self.inner, ffi::CString::new(tag).unwrap().as_ptr() as *mut i8 ) };

   unsafe {
       match *aux {
           b'c'|b'C'|b's'|b'S'|b'i'|b'I' => Some(Aux::Integer(htslib::bam_aux2i(aux))),
           b'f'|b'd' => Some(Aux::Float(htslib::bam_aux2f(aux))),
           b'A' => Some(Aux::Char(htslib::bam_aux2A(aux) as u8)),
           b'Z'|b'H' => {
               let f = aux.offset(1) as *const i8;
               let x = ffi::CStr::from_ptr(f).to_bytes();
               Some(Aux::String(mem::copy_lifetime(self, x)))
           },
           _ => None,
       }
   }

} The segfault is happening on the match *aux line. I'm not an expert in rust yet, but it looks like it's dereferencing the aux variable without checking if it's set to null (zero). I think bam_aux_get returns zero if there was no matching tag. Would it be possible to check for a null value returned, then return None from the aux method before dereferencing the aux variable? I think that would fix the segfaults. Thanks!—Reply to this email directly or view it on GitHub.

benwbooth commented 9 years ago

It's working now! Thanks!