Codon Optimization module

Hi all, I noticed that there is not module, class, or function for calculating a codon optimized sequence.

I would like to help by contributing code for this calculator, but I would need some guidance. I have found some useful tools in BioPython and a very old SynBio python library that someone posted on Bitbucket in 2009 (https://bitbucket.org/chapmanb/synbio/src/tip/SynBio/). The code (which is not usable in it's current state) for calculating codon optimized sequences is here (https://bitbucket.org/chapmanb/synbio/src/7b1b3a972b7ed9e6b5bfb081c1c19b4a6b4410c2/SynBio/Codons/Optimize.py?at=default). From looking through this library, there seems to be a lot of scripts that would be really useful for the FORGE project including barcoded plate tracking and database schemas, but the lack of documentation leaves this library unusable. Aside from emailing the author, which I will do, does anyone have any advice on how to go about using this code?

For codon optimization, we would need the following input:

amino acid sequence of target protein
chassis strain (such as E. coli) to host expression of the target gene

From here, we would choose a method for codon optimization. For example we could use the following codon sampling strategy: Codon frequency matching ("codon harmonization"). Roughly, this means look at the native mRNA and its uses of codons and mimic this in the target species; a codon which is rare in the native should be replaced with one rare in the target. Logic: some rare codons may just help fold things properly.

Any feedback would be helpful.

Thanks, Andrea

scikit-bio / scikit-bio

Codon Optimization module #701