Use friendzymes scripts?

jakebeal commented 3 years ago

Friendzymes has a collection of scripts for pre-synthesis part validation that may be useful for us (communication from @eyesmo):

we’ve got pipeline scripts in a couple of formats now—scripts in a GitHub repo (https://github.com/Koeng101/pichia_toolkit, https://github.com/Open-Science-Global/friendzymes-toolkit), as Colab notebooks (https://colab.research.google.com/drive/1LVXVx4dyZ3ot39-v7v0uN7uhrO22CkQO?usp=sharing, https://colab.research.google.com/drive/1fkMj7kAlg8hroa07x-q3AAbmGVW3JY_2?usp=sharing [that second one still needs some work]), and I believe Isaac Guerreiro now has a prototype Github Action set up as well, though I don’t know where that is.

isaacguerreir commented 3 years ago

Hi, @jakebeal

It's really good to see that our tools could help other projects like the iGEM next year distribution! I'm actually finishing some details for the Github Action prototype, I'll sharing here in the next days. With the other tools, I think we have some scripts for:

Autoannotate genetic parts returning a genbank file with problematic parts (helping designers to understand what things will be important to change for synthesis)
Calculate complexity scores using IDT API
Add restriction binding site and overhangs using a Golden Gate standard
Run a golden gate simulation to test if a group of parts are correctly designed
Codon optimization (using standard and customized codon tables) and CDS correction (changing synonymous codons for removing hairpins, homopolymers and repeated sequences)

I would like to know what scripts could be interested in the iGEM Distribution so we could adapt what we already have to an easy-to-integrate format. Also if some of you have some other ideas for scripts feel free to share them with us.

jakebeal commented 3 years ago

@isaacguerreir This all sounds excellent, and I think we'll be able to put some of those to use immediately. We are very receptive to incoming pull requests as well!

In particular, we can likely immediately use:

Annotation of problematic parts
Synthesis is planned to be via Twist, but IDT issues are likely to be good proxies for Twist issues

GoldenGate is planned to be used, so the details aren't yet determined (see #18); getting those nailed down with @vinoo-igem is a pre-requisite for starting that checking. We'll also need to check for BioBrick compatibility.

If you look at the automation that's been set up on this project, you'll see that the workflow is to compile part "packages" from Excel and input files, then README files that summarize the state of each package, and finally a whole-distribution collation. I think we'd want to have issues computed at both the package and whole-distribution level, then put that information out into the README files. Some parts may be deliberately problematic, so we'll want to have this be "warning" rather than "error" behavior.

One key difference is that we're working with SBOL3 as the representation, since it allows us to describe relations between parts in ways that GenBank can't, but we can convert to/from GenBank, so that's not necessarily a barrier.

iGEM-Engineering / iGEM-distribution

Use friendzymes scripts? #120