qiime2 / q2-demux

BSD 3-Clause "New" or "Revised" License
0 stars 20 forks source link

Golay error correction assumes barcodes are reverse complemented #99

Closed johnchase closed 4 years ago

johnchase commented 5 years ago

Running emp_single with my data I was receiving errors that no samples were assigned sequences when using the golay_error_correction=True flag

After investigating I believe the issue lies with the complement of the barcode. Here I just wrote a for loop and returned the number of barcodes with 4 or more error corrections (uncorrectable) for both the normal and reverse complements. The normal (or how we maintain our barcodes) are being returned almost exclusively non-valid.

I am not sure if there is a way to account for complement currently, I may have missed it in the code

from q2_demux._ecc import GolayDecoder

def rev_comp(seq):
    seq = (seq
           .replace('A', 't')
           .replace('T', 'a')
           .replace('G', 'c')
           .replace('C', 'g')
           .upper()[::-1])
    return seq

decoder = GolayDecoder()

n_normal = 0
n_rev_comp = 0

for gol_code in golay_barcodes:
    rev_gol_code = rev_comp(gol_code)
    seq, n = decoder.decode(gol_code)
    if n == 4:
        n_normal += 1

    seq, n = decoder.decode(rev_gol_code)
    if n  == 4:
        n_rev_comp += 1  

print(n_normal) 
print(n_rev_comp)

1698
0

Where golay_barcodes is a list of valid barcodes.

wasade commented 5 years ago

@johnchase, this is to the best of my knowledge consistent with QIIME 191.

I took the above code, and ran the first 9 BarcodeSequences from the QIIME1 test data for split_libraries_fastq.py, and the result is comparable to your output with n_normal of 7 and n_rev_comp of 0.

Similarly, I pulled a subset from table S1 of Walters et al. which describes the EMP Golay barcodes, and obtained similar results where there are results for n_normal.

Running the QIIME1 on the test code directly, it is necessary to specify --rev_comp_mapping_barcode. This parameter is exposed in q2-demux with --p-rev-comp-mapping-barcodes. Did that option yield expected results by chance?

johnchase commented 5 years ago

Thanks @wasade, this makes sense. Previously for our sequencing runs I was not reverse complementing barcodes:

demuxed = demux.methods.emp_single(seqs, 
                                    barcodes=barcodes,  
                                    rev_comp_barcodes=False,
                                    rev_comp_mapping_barcodes=False,
                                   )

With the Golay correction I can reverse complement both the mapping and sequence barcodes:

demuxed = demux.methods.emp_single(seqs, 
                                    barcodes=barcodes,  
                                    rev_comp_barcodes=True,
                                    rev_comp_mapping_barcodes=True,
                                    golay_error_correction=True 
                                   )

Would it be worth noting the orientation of the barcodes necessary for the Golay correction? (I don't actually know how, or if that is even defined)

wasade commented 5 years ago

IIRC, this varies from sequencing center to sequencing center... With QIIME1, it was pretty common to suggest RC'ing in split libraries. But, I agree it would be wonderful to help guide users here. Maybe a new exposed API method called suggest-golay-orientation which generates a report of the user's barcode sequences (both mapping and a subsample of input sequence data), generates a report, and suggests the specific parameter to use?