walaj / SeqLib

C++ htslib/bwa-mem/fermi interface for interrogating sequence data
http://bioinformatics.oxfordjournals.org/content/early/2016/12/21/bioinformatics.btw741.full.pdf+html
Other
133 stars 36 forks source link

Segmentation Fault when running #32

Closed bkohrn closed 5 years ago

bkohrn commented 5 years ago

I am attempting to migrate a program I wrote from using bamtools to using SeqLib. Unfortunately, my C++ isn't that great (I do most of my programing in Python, but would like to use C++ for this purpose). I managed to figure out compiling (I think), but I'm getting a segmentation fault when I call BarRecord::SetQname. The problematic section of code (as determined by getting an output every few lines until the error) is:

SeqLib::BamReader in_bam_file;
SeqLib::BamWriter temp_bam;
in_bam_file.Open(input);
temp_bam.SetHeader(in_bam_file.Header());
//const SeqLib::RefVector references = in_bam_file.GetReferenceData();
int paired_end_count = 1;
temp_bam.Open(prefix + ".temp.bam");
temp_bam.WriteHeader(); 
SeqLib::BamRecord line;
SeqLib::BamRecord temp_read1_entry;
SeqLib::BamRecord temp_bam_entry;
while (in_bam_file.GetNextRecord(line)) {
    if (paired_end_count % 2 == 1) {
        temp_read1_entry = line;
    }
    if (paired_end_count % 2 == 0) {
        std::string tagName;
        SeqLib::BamRecord temp_bam_entry;
        if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) > line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            tagName = temp_read1_entry.Sequence().substr(0,taglen) + \
            temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) + \
            line.Sequence().substr(0,taglen) + \
            line.Sequence().substr(taglen + spacerlen, loclen) + "#ab";
        }
        else if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) < line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            tagName = line.Sequence().substr(0,taglen) + \
            line.Sequence().substr(taglen + spacerlen, loclen) + \
            temp_read1_entry.Sequence().substr(0,taglen) + \
            temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) + "#ba";
        }
        else if (temp_read1_entry.Sequence().substr(0,taglen) + temp_read1_entry.Sequence().substr(taglen + spacerlen, loclen) == line.Sequence().substr(0,taglen) + line.Sequence().substr(taglen + spacerlen, loclen)) {
            paired_end_count++;
            continue;
        }
        // Write entries for Read 1
        temp_bam_entry.SetQname(tagName + ":1"); //ERROR OCCURS AT THIS LINE
        temp_bam_entry.SetSequence(temp_read1_entry.Sequence().substr(taglen + spacerlen));
        temp_bam_entry.SetQualities(temp_read1_entry.Qualities().substr(taglen + spacerlen), 33);
        temp_bam_entry.AddZTag("X?", temp_read1_entry.Qname());
        temp_bam.WriteRecord(temp_bam_entry);
        // Write entries for Read 2
        temp_bam_entry.SetQname(tagName + ":2");
        temp_bam_entry.SetSequence(line.Sequence().substr(taglen + spacerlen));
        temp_bam_entry.SetQualities(line.Qualities().substr(taglen + spacerlen), 33);
        temp_bam_entry.AddZTag("X?", line.Qname());
        temp_bam.WriteRecord(temp_bam_entry);
    }
    paired_end_count++;
}

and the error I am getting is "Segmentation fault: 11". I am working on a Apple computer running MacOSX 10.12.6.

Any idea why this is happening/what I can do to fix it?

Brendan

walaj commented 5 years ago

Hi Brendan,

This is a subtle issue that has to do with how a new BamRecord is constructed. The default constructor is just to construct a placeholder that doesn't allocate any memory. To actually assign memory to this object, you need to call init(). e.g.

SeqLib::BamRecord temp_bam_entry;
temp_bam_entry.init();  

Perhaps the default should be to assign memory with the default constructor. But for now, just do the init() call.

bkohrn commented 5 years ago

Thanks for the help; that fixed that problem. My next issue is that one of the next steps in my program requires sorting the file with samtools sort -n, and samtools is telling me I have a truncated BAM file. This happens in the middle of the program; the file I'm writing is supposed to be an unmapped BAM. It looks like it's the correct size, but samtools doesn't like it. Do I need to do something besides temp_bam.Close() to keep the file from being truncated?

Again, thanks for the help, Brendan

bkohrn commented 5 years ago

Never mind, figured it out. I have a new issue, but I'll open a new ticket for it, as it may be a more serious issue that needs to be addressed.