Closed gmagoon closed 6 years ago
Thanks @gmagoon. Could you please also provide a minimal example dataset to replicate this issue? For example, a read that is trimmed correctly when you specify the barcode sequence, and incorrectly when you don't.
I think I see the issue - InsertAligner is not respecting the 'match_adapter_wildcards' setting. Will be fixed in 1.1.18.
excellent, thanks John!
Hi John, It seems like the output is the same with v.1.1.18, except for the version number. I'll work on getting a representative example when I get a chance... Greg
Hi @jdidion ,
I've attached some fastq-format data containing ten readpairs. In these ten readpairs, using a 6-bp N wildcard results in no trimming in v.1.1.18, whereas using the barcode sequence results in trimming. [Note that it turns out that this case, and possibly previous as well, actually used 8-bp barcode, but that is neither here nor there (the last two bp have been specified in both wildcard and no-wildcard tests).]
Here's the command I'm running:
$ atropos trim -pe1 LNGU9.atropos.R1.fastq -pe2 LNGU9.atropos.R2.fastq --aligner insert -e 0.029 --insert-match-error-rate 0.058 -o /dev/null -p /dev/null -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNCGATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
...and without the wildcard, I'm using -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCGATCTCGTATGCCGTCTTCTGCTTG
Without the wildcard, the removed sequence ranges from 37 to 94 bp.
LNGU9.atropos.R1.fastq.txt
LNGU9.atropos.R2.fastq.txt
I tried adapter removal for 2x151 bp Illumina data using the
--aligner insert
option, and I'm getting unexpected behavior when using wildcard N bases in the specified adapter sequence.I show here results of two runs on 1 million read pairs, one with wildcard N for the 6-bp variable barcode sequence for adapter 1:
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNACATCTCGTATGCCGTCTTCTGCTTG
...and one with the actual 6-bp barcode sequence for the data under consideration:-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG
Adapter 2 is specified as:-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
At the end of this post, I include results for each run (with the only difference being the-a
sequence). The results from using the wildcard N are unexpected, in the sense that:When using the default aligner (i.e.
--aligner adapter
), the results (not shown) are consistent with my expectations. (I'm aware that the two alignment approaches have some fundamental differences.)Unless I'm overlooking something obvious, this seems like it might be a bug of some sort, but it isn't immediately obvious to me how it is happening. Perhaps I'm misunderstanding something about the role of the specified adapter sequences in atropos when insert alignment is used?
Wildcard N
No wildcard