mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
449 stars 75 forks source link

`Sequence.gc` methods that consider IUPAC nucleotide ambiguity #128

Open mdshw5 opened 6 years ago

mdshw5 commented 6 years ago

The existing Sequence.gc method purposefully ignores characters other than G/C and uses the sequence length as a denominator to produce "fraction g/c". This has a few benefits:

The downside is that any non-GCAT characters may be included in the denominator:

https://github.com/mdshw5/pyfaidx/blob/7b4d8d7aceadaa1fde05846e854e6eccdba38b77/pyfaidx/__init__.py#L254-L266

I'd welcome any pull request to implement something like: