Closed johnchase closed 4 years ago
@johnchase, this is to the best of my knowledge consistent with QIIME 191.
I took the above code, and ran the first 9 BarcodeSequence
s from the QIIME1 test data for split_libraries_fastq.py
, and the result is comparable to your output with n_normal
of 7 and n_rev_comp
of 0.
Similarly, I pulled a subset from table S1 of Walters et al. which describes the EMP Golay barcodes, and obtained similar results where there are results for n_normal
.
Running the QIIME1 on the test code directly, it is necessary to specify --rev_comp_mapping_barcode
. This parameter is exposed in q2-demux
with --p-rev-comp-mapping-barcodes
. Did that option yield expected results by chance?
Thanks @wasade, this makes sense. Previously for our sequencing runs I was not reverse complementing barcodes:
demuxed = demux.methods.emp_single(seqs,
barcodes=barcodes,
rev_comp_barcodes=False,
rev_comp_mapping_barcodes=False,
)
With the Golay correction I can reverse complement both the mapping and sequence barcodes:
demuxed = demux.methods.emp_single(seqs,
barcodes=barcodes,
rev_comp_barcodes=True,
rev_comp_mapping_barcodes=True,
golay_error_correction=True
)
Would it be worth noting the orientation of the barcodes necessary for the Golay correction? (I don't actually know how, or if that is even defined)
IIRC, this varies from sequencing center to sequencing center... With QIIME1, it was pretty common to suggest RC'ing in split libraries. But, I agree it would be wonderful to help guide users here. Maybe a new exposed API method called suggest-golay-orientation
which generates a report of the user's barcode sequences (both mapping and a subsample of input sequence data), generates a report, and suggests the specific parameter to use?
Running
emp_single
with my data I was receiving errors that no samples were assigned sequences when using thegolay_error_correction=True
flagAfter investigating I believe the issue lies with the complement of the barcode. Here I just wrote a for loop and returned the number of barcodes with 4 or more error corrections (uncorrectable) for both the normal and reverse complements. The normal (or how we maintain our barcodes) are being returned almost exclusively non-valid.
I am not sure if there is a way to account for complement currently, I may have missed it in the code
Where
golay_barcodes
is a list of valid barcodes.