iGEM-Engineering / iGEM-distribution

Repository for collective design of an iGEM DNA distribution
https://igem-distribution.readthedocs.io
Other
42 stars 20 forks source link

Add friendzymes collection #238

Open jcahill opened 2 years ago

jcahill commented 2 years ago

About

This PR is for inclusion of the Friendzymes Collection.

Description

This collection is aimed at expanding what people are able to do with FreeGenes collections and the iGEM distribution, both in terms of genetic assembly and in terms of biomanufacturing. Friendzymes' primary goals are to democratize strain engineering and recombinant protein manufacturing and purification.

For manufacturing, this collection contains an expansion of the FreeGenes Open Yeast Collection, including target P. pastoris-optimized target enzymes for recombinant production (such as Eco31I, an IP-free BsaI isoschizomer and its cognate methyltransferases), additional purification tags, an anti-His tag antibody for protein blotting and quantification, and additional yeast promoters. Further, this collection contains complements to the FreeGenes Bacillus subtilis Secretion Tag Library Plasmids, for recombinant protein production and secretion from B. subtilis. These include B. subtilis promoters, target proteins for production like Pfu-Sso7d polymerase, and various B. subtilis regulatory elements.

For strain engineering, we include E. coli origins or replication, E. coli, B. subtilis and P. pastoris selection markers, counterselection markers for E. coli, an origin of transfer for conjugation from E. coli to other bacterial species, homology arm pairs for genomic integration into B. subtilis and P. pastoris, and 5' and 3' recombinase site parts for insertion, deletion or inversion of synthetic genetic elements. Many of these parts are not elements of a canonical transcription unit, and do not have clearly defined part types in the MoClo/uLoop assembly standard; moreover, for some parts, their insertion into the transcription unit would require changing the overhangs on the core promoter, RBS, CDS, and/or terminator parts.

To address this challenge, we designed a high-fidelity, backwards-compatible expansion of the MoClo assembly standard, AllClo (https://docs.google.com/spreadsheets/d/1TICnbGYY96myM7TPXWwBsLvyadgSfmtbVTGsUN5iMI8/edit?usp=sharing), all with a single 26-overhang set that includes all uLoop overhangs and the vector assembly overhangs used in the Open Yeast Collection, and whose predicted ligation fidelity in a 26-part assembly is 96%.

We further designed a set of part switching linkers, that take as input canonical uLoop transcription unit components and output those parts with new 5' and 3' overhangs. These part switching reactions enable, for instance, insertion of recombination sites 5' to the promoter and/or 3' to the terminator in a TU, or ribozymes 3' to the promoter and 5' to the RBS/start site. In this way, standard uLoop parts can participate in assembly reactions that construct modular vector backbones, composite 5' and 3' UTRs, and multi-tagged CDSs.

The part switching linkers were designed to proceed in two methods: with an orthogonal, linker-specific Type IIS restriction site (BbsI), or with a conditionally methylatable, idempotent BsaI restriction site (mBsaI), that is suppressed when the linker is cloned inside an E. coli cell expressing HpaII and/or MspI, and becomes active when the part is cloned into an MspI-/HpaII- strain or PCR amplified to remove the methylation sites. These parts and this expanded assembly standard have the potential to enable iGEM teams with tools and a framework to manufacture their own enzymatic reagents and perform their own sophisticated modification of strains' genomic background.

Figure: AllClo overview

Technical Notes

  1. SwitchClo linkers may cause some automated checks to fail. This is because they contain IIS restriction sites, by design.
  2. Parts are all housed under the benchling.com/friendzymes namespace. These are available for individual attachment if the maintainers wish. Some benchling items contain additional documentation in their Description fields.
  3. We have not yet enumerated any items in the Libraries and Composites sub-sheet. We can amend the submission further if the maintainers wish for this tab to contain additional information.

Thanks, Friendzymes Contributors

jakebeal commented 2 years ago

@jcahill Can you please run the workflows on your fork? The automation needs to run the build in order to validate whether this can be integrated.

jcahill commented 2 years ago

@jakebeal Running script regression testing now: https://github.com/friendzymes/iGEM-distribution/actions/runs/1842097864

jakebeal commented 2 years ago

The synchronize.yml workflow is needed too, since that's what validates the constructs (as opposed to the script code).

jcahill commented 2 years ago

After several rounds of trial-and-error with source prefix and ID columns, synchronize.yml continues to fail at SBOL export. We are requesting assistance on how to proceed.

Blocker 1

Build automation rejects non-unique data source IDs, but it's unclear how this value can be meaningful if required to be unique.

Blocker 2

If the workflow logs are to be trusted, URI expansions are not being generated correctly. No combination of the following in the two relevant columns has generated a correct expansion:

Data Source Prefix Data Source ID
Prefix from dropdown menu https?://explicit.url.tld/to/part/ID
Prefix from dropdown menu PREFIX:ID
Prefix from dropdown menu ID

That is, all of the following fail:

Data Source Prefix Data Source ID
iGEM Registry http://parts.igem.org/Part:BBa_K1074001
iGEM Registry iGEM:BBa_K1074001
iGEM Registry BBa_K1074001

Logs

Using the final example from above, wiki namespace path /Part: is not included in the URI expansion.

Could not export SBOL file for package Friendzymes: An entity with identity "http://parts.igem.org/BBa_K1074001" already exists in document

jakebeal commented 2 years ago

With respect to your blockers, there are two key pieces of information that I think will help you:

  1. Data sources have a "Literal Part" column that distinguishes whether or not there is expected to be a 1:1 correspondence between identifier and sequence. NCBI and iGEM, for example, both have are literal part, because if I tell you "NCBI accession FJ859897.1" or "iGEM part BBa_K1074001", that should map to a particular sequence. PubMed, on the other hand, is non-literal. So when you say BBa_K1074001 is EcoOri_ColE1pMB1pBR32, it's a mismatch, because if we retrieve BBa_K1074001, the sequence we find won't be the one that's in your sheet. If you got the sequence by extracting it out of BBa_K1074001, then that would be better to go into the design notes. Right now, it believes it's finding several conflicting definitions for BBa_K1074001 and complaining accordingly.
  2. The URI generated (http://parts.igem.org/BBa_K1074001) is the intended one. Since the source material in the iGEM repository isn't in SBOL, we need to convert it into an SBOL object, and this is the name for that object, not the literal URI used to access the SBOL object. (We are working towards an implementation of the packaging approach described in SEP 054). Each import source currently has a special case for how to remap URIs in order to access the import, which is required because there is no standardization across the databases that we import from (lots of future work to be done in generalization of import approaches...)

On a separate note, I would also ask you to consider whether it would be a good idea to split this collection up into more than one package. I see inside of it a number of sub-collections that seem like they might stand on their own, such as the linker subcollection. Most other packages in the distribution are organized around function rather than around source: is that possible to do here, or is this something that needs to be monolithic like the current OpenYeast import from FreeGenes?

jcahill commented 2 years ago

Re: 1 and 2, thanks. We'll revise based on these notes.

Re: the size/scope of the package: We have had some discussion around handling this. I've re-raised the topic with the team in light of your suggestion.

So far, the working model has been to handle the whole collection as a single package, prioritizing the downside risks of confusion and fragmentation likely to stem from introducing an assembly standard of considerable complexity across multiple packages over the downside risks of concentrating too much material in one place.

Would grouping the natural classes of parts into libraries nested within the package be a suitable middle-ground?

jakebeal commented 2 years ago

Ah, if it's got an alternate assembly standard, then it probably does want to be isolated in a single package right now (and that will be a discussion necessary with iGEM HQ). If we had a full implementation of SEP 054, then sub-packages would be the right answer, but at the moment that's not an option.

eyesmo commented 2 years ago

To clarify, in this collection, all parts that are defined with specified overhangs in uLoop--all promoters, RBSs, CDSs, and terminators--have uLoop overhangs. It is the part types that are not explicitly defined in uLoop--vector backbone subcomponents, recombination sites, ribozymes--where the new overhangs and part definitions come in. So at least for level 0/level 1 assemblies, it's not so much intended to be an alternate assembly standard, as an expansion and extension of the existing iGEM/uLoop assembly standard. Happy to talk more about this on this thread, in Wednesday's meeting or on a call.

jakebeal commented 2 years ago

@eyesmo Yes, I think a discussion on the Wednesday distribution call would likely be a good thing.

jcahill commented 2 years ago

Workflows have run successfully on the fork.

vinoo-igem commented 2 years ago

I think discussing this on our next call will be good! I do want to surface that this is ambitious and will take some effort for review, as this will feed directly into a number of different topics that we need to address this year, primarily what iGEM will be defining as the assembly standard beyond L0 basic parts (which also clearly needs work #236 #214) and vector construction and whether this would constitute testing and/or adoption.

eyesmo commented 2 years ago

I do want to surface that this is ambitious and will take some effort for review, as this will feed directly into a number of different topics that we need to address this year, primarily what iGEM will be defining as the assembly standard beyond L0 basic parts (which also clearly needs work #236 #214) and vector construction and whether this would constitute testing and/or adoption.

Very much looking forward to this review/discussion! A core desired outcome of mine is to help move the ball forward on these topics for iGEM