rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
323 stars 124 forks source link

Adding functionaliy to strand reads from cDNA sequencing #50

Open cooklab opened 6 years ago

cooklab commented 6 years ago

I am interested in stranding (assigning if the orignial mRNA came from the + or - DNA strand) for all my reads from minion cDNA-seq. Running porechop effectively finds the end adapter on (~70%) of my reads, but the default function is to then remove the adapter sequence in the processed file.

Maybe one could hack a solution with the de-multiplex option, by supplying the adapter as the barcode, and getting it to break the reads into '+strand-barcode', '-strand-barcode', and 'strand not found'. But I was wondering if you have a comment on this or would be interested to add the option.

I think one would want the output to be a fastq, fasta, etc file where the reads are correctly oriented based on the strand in which they were transcribed from, and another file where the strand cannot be determined. One could specify the stringency by deciding which end of the read (5' or 3' or both) the adapter needs to be found. That is, the 3' polyA adapter is most often found in a read, but the 5' end adapter is more rare, I guess owing to the fact that many cDNA are not truly full length.

Thanks for your thoughts.

ehhill commented 6 years ago

Hi, I'm just curious about which cDNA adapter sequences you were using? I'm trying to filter my cDNA reads for FL using adapter sequences and I'm curious to know how you went about adapter identification with porechop. Cheers.