Open AndreaEdwards opened 9 years ago
Hi @AndreaEdwards. Thanks for the question and interest in contributing scikit-bio!
This is definitely useful functionality, but I'm not sure if it makes more sense to be in scikit-bio, or be a stand-alone package that you develop which depends on scikit-bio. It'd be a little easier for us to decide if we had some example code to look at. Would you be interested in starting to work on it, and then point us at the code? It'd be relatively easy to adapt it for scikit-bio or prepare it as stand-alone package at that point, so I don't think would add effort.
Hi all, I noticed that there is not module, class, or function for calculating a codon optimized sequence.
I would like to help by contributing code for this calculator, but I would need some guidance. I have found some useful tools in BioPython and a very old SynBio python library that someone posted on Bitbucket in 2009 (https://bitbucket.org/chapmanb/synbio/src/tip/SynBio/). The code (which is not usable in it's current state) for calculating codon optimized sequences is here (https://bitbucket.org/chapmanb/synbio/src/7b1b3a972b7ed9e6b5bfb081c1c19b4a6b4410c2/SynBio/Codons/Optimize.py?at=default). From looking through this library, there seems to be a lot of scripts that would be really useful for the FORGE project including barcoded plate tracking and database schemas, but the lack of documentation leaves this library unusable. Aside from emailing the author, which I will do, does anyone have any advice on how to go about using this code?
For codon optimization, we would need the following input:
From here, we would choose a method for codon optimization. For example we could use the following codon sampling strategy: Codon frequency matching ("codon harmonization"). Roughly, this means look at the native mRNA and its uses of codons and mimic this in the target species; a codon which is rare in the native should be replaced with one rare in the target. Logic: some rare codons may just help fold things properly.
Any feedback would be helpful.
Thanks, Andrea