jgaetel / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

Trim both reads at same time #82

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
First, thanks for this great app. It's getting near perfect!

About how cutadapt handles the pair-end data I've got a suggestion:

Dealing with both ends sequentially might be the simplest way to adapt your 
script to handle in synchrony the two files, and at the same time allows 
flexibility to apply different processings to both sides of the sequenced 
fragments.

However some times you just want to remove 3' ligated sequencing adapters from 
the reads in those fragments that are shorter than the sequencing length. In 
those cases, when an adaptor is found in read1, it should be present also in 
read2 and vice-versa.

Sometimes due to sequencing quality, cutadapt finds the adaptor in one read but 
not in the other, leaving a pair with different read sizes.

Could it be possible to add an option to tell cutadapt to trim automatically 
the other end of the pair when the adaptor is found in one of them?

Right now I have to add an extra step to do that. Is not something difficult 
but is an extra step.

Thanks a lot for your commitment.

Cheers,
Carlos

Original issue reported on code.google.com by ctorr...@googlemail.com on 8 Oct 2014 at 4:46

GoogleCodeExporter commented 9 years ago
Thanks for using cutadapt! Your suggestion sounds reasonable, but I can 
probably not implement it in exactly that way. I'm wondering how you do the 
trimming: Do you just compare the read lengths and remove the extra bases from 
the longer one if the lengths differ? I'd like to keep the promise that an 
adapter is removed only if the error rate with which is was found is below the 
specified threshold.

What I could do instead is to add some logic so either both reads or none are 
trimmed. So if the adapter is found only in one of the reads, the adapter is 
not removed at all. Compared to what you suggest, fewer read pairs would be 
trimmed, but then the obvious solution to that would be to increase the maximum 
error rate.

I should also point out issue 81, reported just yesterday, where the reporter 
uses a paired-read merging tool to merge the reads prior to adapter trimming. 
Don't know if that's the way to go.

I'm a bit busy with other things now, so it may take me a while to get this or 
something like it implemented.

Original comment by marcel.m...@tu-dortmund.de on 8 Oct 2014 at 7:58