iGEM-Engineering / iGEM-distribution

Repository for collective design of an iGEM DNA distribution
https://igem-distribution.readthedocs.io
Other
42 stars 20 forks source link

Nicer names for build plans #83

Open jakebeal opened 3 years ago

jakebeal commented 3 years ago

The build plans have displayIds like: Anderson_Promoters_in_vector_BBa_J23100_pOpen_v4 This has two problems:

  1. It's likely to be too long for Twist (who have a 32-character limit)
  2. It's not very pretty

We should have the displayIds for a vector embedding be just BBa_J23100_pOpen_v4, and then we should give them a name as well, like "BBa_J23100 in pOpen_v4"

We will also need to check and make sure there are no collisions of displayId

noahsprent commented 2 years ago

As far as I can tell by looking at https://github.com/iGEM-Engineering/iGEM-distribution/blob/develop/Anderson%20Promoters/views/package.nt this is still an issue that would be worth addressing. @jakebeal I've tried to break this issue down into a checklist of steps that need achieving, would this match your thinking?:

noahsprent commented 2 years ago

Ok I've been looking into this today and I'm not entirely sure I'm on top of it but I believe that the names given to the build plans come from the expansion of the CombinatorialDerivation by the SBOL utilities function expand_derivations, more specifically maybe cd_assigment_to_display_id which just concatenates the display IDs to give the final ID. @jakebeal are we looking to modify that behaviour or instead convert the predictable names that come out of that function into the format given above?

jakebeal commented 2 years ago

That's correct.

We will also want to put an entry in the optional "name" field (which is allowed to have spaces and such) while we're at it.

noahsprent commented 2 years ago

Thanks! Which one of the two approaches do you recommend? Editing the function in SBOL-utilities or an addition to the distro scripts?

jakebeal commented 2 years ago

I would suggest doing it in SBOL utilities. Likely a good choice would be to have an option for "compact names" that selects the new way vs. the old way.

noahsprent commented 2 years ago

Notes on how this is implemented (mainly for me):

Discussion/questions:

I'm not sure that the behaviour of expand_derivations or cd_assigment_to_display_id is wrong, as I don't know how it's supposed to know what is a backbone/locus and what is an insert etc.

So in order to make the change I think we might need to change the Excel-to-SBOL function to change the name of the "_ins" CD to the name of the backbone/locus so that the name will be e.g. "Anderson_Promoters_in_vector_pSB1C5_J23110".

We'll then need to parse this back into "J23110_pSB1C5" in the distribution scripts. But how will we know when we want to keep the "Short and human friendly name" and when we don't want to? Possibly if the user leaves the name blank then it isn't used, but otherwise it is?

jakebeal commented 2 years ago

Maybe a sufficient shortening would be to have the names be: library_backbone_variables for the plasmid and library_variables for the insert, e.g., "Anderson_Promoters_in_vector_pSB1C5_J23110" and "Anderson_Promoters_in_vector_J23110". That would make them pretty human-friendly.

noahsprent commented 2 years ago

Ok, I agree. I'll have a think about the best way to implement this.