Closed maltesemike closed 3 months ago
Currently, cubar
has no option for codon optimization. Do you want to replace each codon in a CDS to the optimal synonymous one? I could consider add such a function.
However, I have to mention that each codon is optimal does not mean the whole CDS is optimal. Some non-optimal codons are used on purpose, for example, to slow elongation and allow for correct co-translational folding of the nascent peptide chain.
That was my plan yes. I am trying to introduce a fluorescent protein transgene into our model system and my fear is that codon usage might differ, so I was hoping to optimise it to suit our system as best as possible.
A tool to optimise codon usage would this be really useful, although I did not think about your sentiments on slowing the rate of elongation. This also makes sense. Is there a way to account for this too?
I am guessing that improving the CAI of a particular CDS to match that of the most highly expressed genes in the genome would be a good start already.
Hi, I added a new function called codon_optimize
, which replaces each codon with its optimal counterpart. Please try it out, and any feedback would be greatly appreciated.
A tool to optimise codon usage would this be really useful, although I did not think about your sentiments on slowing the rate of elongation. This also makes sense. Is there a way to account for this too?
I am afraid that there is no simple rule to do such optimization.
Thanks for adding this feature, it is extremely useful! A few things I've noticed:
The optimised CDS does not have the original stop codon appended, but seems to add an "NA" dinucleotide instead.
Regarding the rule to optimise codons. What are the rules for this? After inspecting my optimised sequence, I clearly see a large increase in CAI to a value matching the right hand side of the CAI bell curve from highly expressed genes. On closer inspection of the optimised sequence, it seems that the nucleotides are not always changed to the best one from the differential usage analysis (ie.e. based on the her vs leg analysis). In some cases, they are changed to a more "poorly" used codon (ie. OR value less than 1). I am guessing this is by design? What is the rule for the change?
Hi, thanks for the feedback.
Besides, I was considering update this function so that users can determined optimal codons by gene expression or provide a predefined list of optimal codons. The current function is useful when there is no genome-wide expression data.
Thank you for the great package and easy to follow instructions.
I've managed to run through the tutorial with my own non-model genome now.
Is there a tool to automatically codon optimise a desired sequenced based on the optimal codons
cubar
calculates?