mat10d commented 1 year ago

I have a 10x experiment where R1 contains the cell barcode and UMI, and R2 contains the cDNA sequence. I want to trim R2 with a sliding window approach where a 10 bp window that has an average score of below 28 is my threshold: SLIDINGWINDOW:10:28

My initial approach was just to run trimmomatic as such on R2, and then use cutadapt to filter any reads of length 0 before mapping:

` java -jar trimmomatic-0.39.jar SE -threads 16 -phred33 ${FASTQ_base}/FASTQ/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz SLIDINGWINDOW:10:28

cutadapt -j 0 --minimum-length :1 -o ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R1_001.fastq.gz -p ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ/L458_898_S2_L001_R1_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz `

However, I notice that trimmomatic does remove a small subset of reads (Input Reads: 153435707 Surviving: 150653678 (98.19%) Dropped: 2782029 (1.81%)), which then makes quickly filtering using cutadapt impossible.

Should I be using a paired end approach? Or do you have another suggestion for how to tackle this? I don't want to do any read trimming on R1 as it just contains the barcode and UMI. Thanks so much,
