jstjohn / SeqPrep

Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
MIT License
140 stars 51 forks source link

N when bases differ #12

Closed dakl closed 4 years ago

dakl commented 11 years ago

Hi,

I want to modify SeqPrep so that bases that mismatch when two reads are merged are set to N rather than having the quality of the best base reduced.

I have fill_merged_sequence() modified utils.c to read (snippet):

   if(c1 == c2){
     sqp->merged_seq[j] = c1;
     sqp->merged_qual[j] = match_p33_merge(q1,q2);
   } else {
     sqp->merged_seq[j] = 'N';
     sqp->merged_qual[j] = 1;
   }

and read_merge():

  if(subjseq[i] == queryseq[i-mpos]){
    c = subjseq[i];
    q = match_p33_merge(subjqual[i],queryqual[i-mpos]);
  }else{
    q = '#';
    c = 'N';
  }

I have done a few tests and this seems to do the trick, but I still wanted to check with you so that this indeed covers all scenarios of having detected adapters and not.

Have I missed anything?

cheers, happy midsummer.

Daniel

jstjohn commented 11 years ago

The adapter trimming stuff is separate from the read merging, aside from the forced merge when adapters are found. I don't have time to dig through the code now, but this seems right.

On Jun 20, 2013, at 11:20 AM, dakl notifications@github.com wrote:

Hi,

I want to modify SeqPrep so that bases that mismatch when two reads are merged are set to N rather than having the quality of the best base reduced.

I have fill_merged_sequence() modified utils.c to read (snippet):

if(c1 == c2){ sqp->merged_seq[j] = c1; sqp->merged_qual[j] = match_p33_merge(q1,q2); } else { sqp->merged_seq[j] = 'N'; sqp->merged_qual[j] = 1; } and read_merge():

if(subjseq[i] == queryseq[i-mpos]){ c = subjseq[i]; q = match_p33_merge(subjqual[i],queryqual[i-mpos]); }else{ q = '#'; c = 'N'; } I have done a few tests and this seems to do the trick, but I still wanted to check with you so that this indeed covers all scenarios of having detected adapters and not.

Have I missed anything?

cheers, happy midsummer.

Daniel

— Reply to this email directly or view it on GitHub.

dakl commented 11 years ago

Yeah, exactly. I guess the force merge and "normal" merge are separate too, that's why I had to modify the code in two places? Many thanks for the quick reply.

Daniel

dakl commented 11 years ago

Hi,

I forked and made the changes. It seems to work as expected, even if I have had a hard time finding test files with adapters in them to test the changes when performing a forced merge. Normal merged is tested and works as expected though.

Feel free to add the code to your base if you're interested. https://github.com/dakl/SeqPrep

Thanks for a great tool, Daniel