iGEM-Engineering / iGEM-distribution

Repository for collective design of an iGEM DNA distribution
https://igem-distribution.readthedocs.io
Other
42 stars 20 forks source link

Genbank file outputs linear seqs instead of plasmids #287

Open nickdelkis opened 2 years ago

nickdelkis commented 2 years ago

Distribution.gb sequences are linear instead of circular.

Exporting circular sequences would make it less confusing for teams (although they can be used for cloning purposes as linear seqs as well)

jakebeal commented 2 years ago

Is this an actual issue, or a test issue?

nickdelkis commented 2 years ago

It is a true issue, wanted to add context after the webinar and forgot. Adding now.

nickdelkis commented 2 years ago

Actually, I think this shouldn't be too hard to tackle. I will look into it Jake

jakebeal commented 2 years ago

I suspect that the problem will turn out to be that the SBOL3 to SBOL2 converter isn't handling the SequenceOntology "circular" topology type correctly.

nickdelkis commented 2 years ago

Are you referring to this function?

https://github.com/SynBioDex/SBOL-utilities/blob/cd8f27f96de27aae044296204c67917f5672aa72/sbol_utilities/conversion.py#L169-L201

jakebeal commented 2 years ago

That's right.

SBOL2 and SBOL3 both use SequenceOntology term SO:0000988 (Circular), but recommended canonical form has been simplified by identifiers.org from http://identifiers.org/so/SO:0000988 to http://identifiers.org/SO:0000988. With pySBOL3, using tyto to compare terms will find that these are equivalent.

With the old SBOL2 to GenBank converter, however, it's a string comparison, which won't recognize http://identifiers.org/SO:0000988 as the circular type: https://github.com/SynBioDex/libSBOLj/blob/d99912e57e2d734d86576f6cdb6b24ce88cd6583/core2/src/main/java/org/sbolstandard/core2/GenBank.java#L449

I thus believe this issue should be able to be patched by adding entries for the topology types (circular, linear, single-stranded, and double-stranded) to the type-remapping table in the conversion function you point to.

The longer-term fix will be the direct SBOL3 to GenBank converter that @mohitdmak is working on in Google Summer of Code right now, but a) that's not yet ready, and b) this patch will make sure these test cases are there for him to match.