marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
502 stars 126 forks source link

Need help in using cutadapt to trim paired-end fastq reads to provide as input for DADA2 #765

Closed akspat closed 3 months ago

akspat commented 3 months ago

I am trying to use cutadapt for the first time to trim 300 paired-end fastq files generated using 16S rRNA gene amplicon sequencing of the V3-V4 regions of the 16S rRNA gene. I am doing this analysis on High performance computing (HPC) cluster and I used module load cutadapt function on HPC to use cutadapt.

The V3-V4 regions of the 16S rRNA gene were amplified using a mixture of the universal bacterial primers 341F1–4 (5′ CCTACGGGNGGCWGCAG 3′) and 785R1–4 (5′ GACTACHVGGGTATCTAATCC 3′) with partial Illumina TruSeq adapter sequences added to the 5′ ends (F1; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, F2; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTgt, F3; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTagag, F4; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTtagtgt and R1; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, R2; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTa, R3; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTtct, R4; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTctgagtg).

So, I followed the cutadapt tutorial ( https://cutadapt.readthedocs.io/en/stable/guide.html#paired-end ) and used the following code from the tutorial to trim paired-end fastq reads:

cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq

I made a sbatch file with cutadapt code for every 300 samples by putting the forward universal bacterial primer in the -a ADAPTER_FWD section and the reverse universal bacterial primer in the -A ADAPTER_REV section of the code. An example code of what I did is below:

cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample1_trimmed_1.fastq -p sample1_trimmed_2.fastq sample1_1.fastq sample1_2.fastq

cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample2_trimmed_1.fastq -p sample2_trimmed_2.fastq sample2_1.fastq sample2_2.fastq

........................... upto 300 samples.

Question 1) Am I correctly using cutadapt?, because I couldn't understand where should I insert the four partial Illumina TruSeq adapter sequences (F1, F2, F3, F4 & R1, R2, R3, R4) in the above code, I only used the forward & reverse universal bacterial primers to trim.

Question 2) How do I create a loop of sample names in the cutadapt code, so that I don't have to write the cutadapt code line 300 times in my sbatch script?

Thank you.

marcelm commented 3 months ago

Hi, I’ll try to answer more fully later when I have more time, but please have a look at this section in the documentation for now.

akspat commented 3 months ago

Okay, Thank you.

marcelm commented 3 months ago

I think the section in the documentation should work for your case. Let me know if that did not help you enough.

Regarding the Illumina adapters: You can ignore these because you the Illumina adapters would come after the primer sequences. Since you search for the primer already, the adapter (coming after it), gets also removed when the primer is removed.

Regarding the 300 samples, I just added a section to the documentation about this: https://cutadapt.readthedocs.io/en/latest/recipes.html#many-samples

Let me know if that is helpful; I’d be happy to explain more if necessary.

akspat commented 3 months ago

Thank you very much. It was really helpful.