usadellab / Trimmomatic

Other
208 stars 70 forks source link

Overrepresented sequences remain after adapter trimming #39

Open judy-m2 opened 1 year ago

judy-m2 commented 1 year ago

Hello -

I am working on RNA data and I am trying to remove the adapter sequences from my reads. My raw data looks something like this:

Raw data

When I run the recommended settings for adapters [ILLUMINACLIP:/$EBROOTTRIMMOMATIC/adapters/TruSeq3-PE.fa:2:30:10:2:True], the "Adapter Content" tab on the fastqc report no longer gives a warning but all the overrepresented sequences are still there.

I tried to adjust the settings of the adapter trimming step, and got some better results, but I still have adapter content in the overrepresented sequences.

I ran trimmomatic like this

java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.39.jar PE RawReads/GMCF-1049-DMD-1_S1_L001_R1_001.fastq.gz RawReads/GMCF-1049-DMD-1_S1_L001_R2_001.fastq.gz -trimlog DMD1-logfile.log -baseout trimmedReads_v2/DMD_1.fq ILLUMINACLIP:/$EBROOTTRIMMOMATIC/adapters/TruSeq3-PE.fa:2:40:15:1:True LEADING:3 TRAILING:3 MINLEN:36 HEADCROP:10

and my overrepresented sequences still look like this: Screen Shot 2022-09-20 at 3 14 18 PM

Now I know that with RNA seq data, you're suppose to get overrepresented sequences because those are the over expressed genes. However, my concern is that the overrepresented sequences are still being identified as adapters. Is this a problem? Should I change the settings on the adapter trimming step again to allow for a higher threshold, or do I run the risk of cutting sequences that I want to keep.

Any advice would be helpful. Thanks.