Gs at the beginning of the reads

timbitz / Aligater

Software suite for detection/analysis of chimeric RNAs from LIGR-seq data

MIT License

2 stars 1 forks source link

Gs at the beginning of the reads #8

Open fgypas opened 7 years ago

fgypas commented 7 years ago

Hello again. I have a new question and this time is related to trimming of the reads. It was mentioned to me that due to the preparation (i think) of the samples, some Gs are introduced at the beginning of the reads. Indeed when I checked it there is a big portion of reads like this. The question is, should I trim a fixed number of Gs or everything that starts with G should be trimmed?

timbitz commented 7 years ago

With our LIGR-Seq data the first thing I do is filter for unique reads including the random barcode, then I trim 5nt off the front and run the rest of the pipeline after that.

timbitz commented 7 years ago

I don't think it will cause a huge problem though if you leave them in. The alignment steps should be lenient enough to allow non-aligning segments on the 5' and 3' ends of reads.

fgypas commented 7 years ago

5nt when you have Gs or in general?

timbitz commented 7 years ago

Our raw data has an 5' random barcode that I trim in general, is it possible that your library prep is different from ours? Either way I don't think it will affect how aligater runs, it is designed to allow soft clipping on the ends.