modernatx / seqlike

Unified biological sequence manipulation in Python
https://modernatx.github.io/seqlike
Apache License 2.0
207 stars 21 forks source link

deterministic codon map by `codon_table_to_codon_map()` effectively chooses codons arbitrarily #70

Closed ndousis closed 1 year ago

ndousis commented 1 year ago

When applying the function codon_tables.codon_table_to_codon_map(), the default (deterministic) behavior of the resulting codon map is to choose the first codon from the table. The codon tables available in SeqLike or sourced from Edinburgh Genome Foundry are in codon-alphabetical order, which seems rather arbitrary.

Arguably, the preferred/expected behavior is for the deterministic default to choose the most frequent codon from the table.

I propose that the function always sorts the codon table entries in order of most-to-least-frequent codons when deterministic = True. Alternatively, we could add an argument option to sort the codon table entries.

ndousis commented 1 year ago

A third option would be to pre-sort the available codon tables by codon frequency such that the deterministic codon map chooses the most frequent codon without additional sorting.