statgen / bamUtil

http://genome.sph.umich.edu/wiki/BamUtil
87 stars 29 forks source link

compare reads ignoring read id #8

Open avilella opened 11 years ago

avilella commented 11 years ago

Hi,

I would like to compare two bam files ignoring the read ids: if two reads contain the same sequence string but different read ids, I would like bam diff to consider them the same, and skip them in the _only1 and _only2 files when running bam diff --in1 bam1.bam --in2 bam2.bam --out my.bam

I have modified the code below, but I would like to know if this should be enough or should be combined with specific parameters when running bam diff.

Thanks in advance.

bam diff matches records by different parameters and also by

ReadName. I modified the code so that it doesnt look at the readname

when doing the comparison

cd /home/avilella/src/bamUtil/src

~/src/bamUtil/src/Diff.cpp

Around line 463

bool Diff::matchingRecs(SamRecord* rec1, SamRecord* rec2) { if((rec1 == NULL) || (rec2 == NULL)) { // one or both of the records is NULL, so return false. return(false); }

// // Have 2 records, so compare them.
// if((SamFlag::getFragmentType(rec1->getFlag()) == 
//     SamFlag::getFragmentType(rec2->getFlag())) &&
//    (strcmp(rec1->getReadName(),
//            rec2->getReadName()) == 0))
// Have 2 records, so compare them.
if(SamFlag::getFragmentType(rec1->getFlag()) == 
    SamFlag::getFragmentType(rec2->getFlag()))
{
    // Same fragment and read name.
    return(true);
}
return(false);

}

mktrost commented 11 years ago

Sorry, for the delayed response. I just got back from vacation traveling overseas. Yes, that is the correct place to make the modification. But in the code you included above you are only checking that the flags are the same. I think you also want to add a check for the same sequence.

Then depending on what you want in your output, you may want to update the getDiffs/writeDiffDiffs/writeBamDiffs methods to check for readName differences.

Let me know if you have any additional questions.
Mary Kate