naobservatory / mgs-workflow

MIT License
4 stars 2 forks source link

Consider adapter trimming just from 3' end #43

Open mikemc opened 3 months ago

mikemc commented 3 months ago

Per discussion in Twist here, @evanfields notes that

We run with adapters specified as -b, meaning could be 5’ or 3’ end, so when an adapter matches cutadapt gets to decide which way to trim from the adapter.)

Based on my understanding of Illumina sequencing, and Illumina's recommendation in Adapter trimming Why are adapter sequences trimmed from only the 3' ends of reads (Illumina Knowledge Base), for the paired end libraries we're typically working with it seems like we should just be trimming from the 3' end.

https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types

willbradshaw commented 1 month ago

Thanks @mikemc, I'll add this to the list for Harmon to look into next quarter.

harmonbhasin commented 2 days ago

Noting that I believe this could be the reason that we see cutadapt overtrimming when benchmarking on a simulated dataset. Will look into this when I get back on adapter trimming.