marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
511 stars 129 forks source link

Should I use -a or -g when demultiplexing ONT reads with dual barcodes? #799

Open ashleyp1 opened 1 month ago

ashleyp1 commented 1 month ago

cutadapt 4.9

I have 16S amplicon reads that were sequenced with ONT that I am trying to demultiplex. Each sample was PCR barcoded with a 13 base barcode on both ends, so I expect a read to start with a barcode and end with its reverse complement. I put together a fasta file of all my pairs, some are listed below.

>HL001_FW
ATCCGGTCGGAGA...TCTCCGACCGGAT
>HL002_FW
CTGAGGTGATCAG...CTGATCACCTCAG
>HL003_FW
AGTGTCCTGCTAG...CTAGCAGGACACT
>HL004_FW
ATAAGCAATTCGA...TCGAATTGCTTAT

The problem I run into is whether to use the -a or -g flag. Looking through the documentation I see it used almost interchangeably for linked adapters, but I get different outputs depending on which I use and I'm not sure which is correct. I used the below commands, for reference

cutadapt -e 1 -a file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz

cutadapt -e 1 -g file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz
marcelm commented 1 month ago

The difference between -a and -g for linked adapters lies in which adapters are required to be in the read, see https://cutadapt.readthedocs.io/en/stable/guide.html#linked-override .

For -g, both adapters are required. For -a, only anchored adapters are required, non-anchored adapters are optional.

The distinction between required and optional is only necessary for linked adapters (the one with the ... in the middle) and determines what happens when one of the constituent adapters is not found.

The rules are like this:

So if you know your reads are long enough so that you should see both primers or if you want to ensure you only have full-length sequences in your demultiplexed output, use -g. If you want to be less strict, use -a.

(You could also make the first adapter required and leave the second one optional by writing this in the FASTA file: ATAAGCAATTCGA;required...TCGAATTGCTTAT.)

ashleyp1 commented 4 weeks ago

Thanks for the quick answer! That definitely clears things up for me.

I have a follow up question though, after reading through the documentation more. When demultiplexing, does cutadapt require the complete barcode to be present for it to count? For example, for BARCODE it would identify and trim BARCODEsequence and not CODEsequence. Basically, I want to make sure that I only keep reads with a complete barcode.

marcelm commented 3 weeks ago

To require the full barcode to be present, use an anchored adapter. You can either add the ^ to each sequence in the FASTA file:

>HL001_FW
^ATCCGGTCGGAGA...TCTCCGACCGGAT

or, as a shortcut, add the ^ before the file: like so: cutadapt -a ^file:barcodes_for_cutadapt.fasta.