iGEM-Engineering / iGEM-distribution

Repository for collective design of an iGEM DNA distribution
https://igem-distribution.readthedocs.io
Other
42 stars 20 forks source link

Automated checks on DNA sequences and stages, from entry to pre-synthesis release #251

Open vinoo-igem opened 2 years ago

vinoo-igem commented 2 years ago

We have several different types of checks we need to do at different stages, and some of them will be repeating. I'll lay out what I did for the first pre-release here.

  1. Flanking insert: checked for D2001 and D2002 (small part padding sequences), and stripped them if present
  2. Flanking insert: checked for flanking BsaI sites, and stripped them if present
  3. Flanking insert: checked for assembly compatibility (BsaI, SapI, BbsMI, BbsI)
  4. Checked fusion sites
  5. Changed status of any that failed check assembly compatibility check to "Not building" in build tracker

This is obviously, not the order that we want to do things! And we'd want some of these checks done multiple times through the process. Ex. ee'll want to check assembly compatibility and multiple stages (part, insert, part in backbone). For this first run I was only able to strip the flanking sequences because they are static and I had checked them together for assembly compatibility.

eyesmo commented 2 years ago

It's also important to have the option of flagging when a part has a 'forbidden' restriction site (or sites) that is intentionally there as part of the part's function--for instance, assembly linkers that flank a TU and then can be cut to expose new overhangs for multi-TU cassette/level 2+ assembly. The simple way to do this would be a flag that turns off the compatibility check, but that might lead to the inclusion of forbidden sites that weren't intended/needed for function as well as the intentional ones. A better approach might be to allow submitted parts to specify the identities and locations of forbidden cut sites that are required for part function, and that the checker can then reference when it parses for forbidden restriction sites, ignoring all forbidden sites on its pre-approved list (e.g. ['BsaI', F, 154] to specify a BsaI binding site on the forward/top strand starting at sequence index 154)

eyesmo commented 2 years ago

Might also be good to be able to specify custom small part padding sequences, in addition to D2001 and D2002 (those are specific stuffer sequences, right?)