huangyh09 / brie

BRIE: Bayesian Regression for Isoform Estimate in Single Cells
https://brie.readthedocs.io
Apache License 2.0
41 stars 15 forks source link

FastaFile.rev_seq fail to complement for lower case input #10

Closed s6juncheng closed 5 years ago

s6juncheng commented 6 years ago

Hi Yuanhua,

there is a potential bug in FastaFile.rev_seq() function:

rev_seq("atgc")
>>> 'cgta'

Some parts of the reference fasta file has lower case letters (repetitive regions), in which case it will be a bug. A possible better option will be

def rev_seq(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A', "N": "N"}
    return '"".join([complement[base] for base in seq.upper()[::-1]])

Best, Jun

huangyh09 commented 6 years ago

Hi Jun,

Many thanks for the kind suggestion. Just fixed this bug.

best, YH