vsoch / freegenes

FreeGenes BioNode with Django
https://vsoch.github.io/freegenes/
Mozilla Public License 2.0
2 stars 4 forks source link

CompositePart Verify Types #59

Closed vsoch closed 5 years ago

vsoch commented 5 years ago

@Koeng101 Let's discuss what fields we need for a composite part - here is what I have so far:

class CompositePart(models.Model):
    '''a composite part is a virtual representation of a group of parts.
    '''
    uuid = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

    # Why do these fields use time, and others use date (e.g., see ip_check*)
    time_created = models.DateTimeField('date created', auto_now_add=True) 
    time_updated = models.DateTimeField('date modified', auto_now=True)

    name = models.CharField(max_length=250, blank=False)
    description = models.CharField(max_length=500, blank=True, null=True)
    gene_id = models.CharField(max_length=250)
    part_type = models.CharField(max_length=250, choices=PART_TYPE)

    # Sequences - we would want 10K to 100K, can go up to 4 million (but not practical)
    original_sequence = models.TextField(validators=[validate_dna_string], blank=True, null=True)
    optimized_sequence = models.TextField(validators=[validate_dna_string], blank=True, null=True)
    synthesized_sequence = models.TextField(validators=[validate_dna_string], blank=True, null=True)
    full_sequence = models.TextField(validators=[validate_dna_string], blank=True, null=True)

    parts = SortedManyToManyField("main.Part")

    def get_absolute_url(self):
        return reverse('compositepart_details', args=[self.uuid])

    def get_label(self):
        return "compositepart"

    class Meta:
        app_label = 'main'

The ordered m2m is using this module https://github.com/jazzband/django-sortedm2m

Do we need all those fields for sequences, given that it's not a base Part? Let me know what we can trim away, and if there are other fields we need to add. I assume that to search this field we would need at least one sequence, or to filter based on the parts included.

Koeng101 commented 5 years ago

I made a few modifications:

class CompositePart(models.Model):
    '''a composite part is a virtual representation of a group of parts.
    '''
    uuid = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

    # Why do these fields use time, and others use date (e.g., see ip_check*)
    ## Great question, I don't know.
    time_created = models.DateTimeField('date created', auto_now_add=True) 
    time_updated = models.DateTimeField('date modified', auto_now=True)

    name = models.CharField(max_length=250, blank=False)
    description = models.CharField(max_length=500, blank=True, null=True)
    composite_id = models.CharField(max_length=250) # to differentiate between composite_ids and gene_ids
    composite_part = models.CharField(max_length=250, choices=COMPOSITE_TYPE) # Composite parts should be defined a little bit differently.

    # Sequences - we would want 10K to 100K, can go up to 4 million (but not practical)
    ## If we could at least go to 1 million, that would be convenient for some DNA I want to design
    sequence = models.TextField(validators=[validate_dna_string], blank=True, null=True)

    # Super library that solves this for us!
    parts = SortedManyToManyField("main.Part")

    def get_absolute_url(self):
        return reverse('compositepart_details', args=[self.uuid])

    def get_label(self):
        return "compositepart"

    class Meta:
        app_label = 'main'
vsoch commented 5 years ago

Got it! Thank you!

vsoch commented 5 years ago

oup one more question - what are the COMPOSITE_TYPEs?

vsoch commented 5 years ago

Also - the current database has 2720 samples that are pointing to parts, but you are saying that you want this to point to CompositePart, and also that a Part != a CompositePart (but is a separate table). How do we resolve this? If the model needs to be updated, then we would need to set all of those part references to None and remove the requirement. It's a bit risky and there is always the chance for data loss, but of course I'll test locally first.

Koeng101 commented 5 years ago

I think we should go with a two step program. First, we build the CompositePart table, and then add Sample.CompositePart link. I will interact with the API in order to update all samples with new CompositeParts, and once we accomplish that, we delete the Sample.Part link.

How does that sound?

Koeng101 commented 5 years ago

I'll actually have to think for a hot minute on what COMPOSITE_TYPEs there should be.

There are a lot of different strategies, and so I want to wrap my brain around which would be best (should it be "base_part", or "BsaI-AATG.GTCC-BsaI"?)

vsoch commented 5 years ago

For the database, as soon as I write that Sample links to a Composite Part, it will spit out a nasty error that it's wrong (there are already linked parts). So what we would need to do is add the CompositePart model, but not change the Sample.part to be linked to it (leave it as main.Part) and then we would need to:

  1. first create the composite parts
  2. then delete the old parts, and in doing so, remove the link to samples
  3. Then update the Model to point to Composite parts
  4. Then add the Composite Parts to their corresponding sample

Another alternative is to leave sample.part, but add sample.composite_part to link to a composite part. Then we can delete the sample.part associations (when ready) and it's up to you if you want to eventually delete the association sample.part all together. It comes down to if you are okay with having sample.composite_part instead of sample.part.

Koeng101 commented 5 years ago

I think sample.composite_part strategy would work the best. Also, I'm pretty happy with having sample.sample.composite_part instead of sample.part to make it clearer to where it is directing.

vsoch commented 5 years ago

Great! And me too :)

vsoch commented 5 years ago

okay, I've put in dummy types (and it's just string so easy to edit) I'm going to leave the issue open for further discussion of the correct types. And our migrations went without a hitch!

Migrations for 'main':
  fg/apps/main/migrations/0003_auto_20190926_1118.py
    - Alter field plate_type on plate
    - Alter field plate_vendor_id on plate
    - Alter field part on sample
    - Create model CompositePart
    - Add field composite_part to sample
root@5030de6c0281:/code# python manage.py migrate
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, main, sessions, social_django, taggit, users
Running migrations:
  Applying main.0003_auto_20190926_1118... OK

Thank goodness :) Here is a direct link to the model for reference: https://github.com/vsoch/freegenes/blob/master/fg/apps/main/models/__init__.py#L180

Koeng101 commented 5 years ago

I think CompositePart.composite_part should be CompositePart.composite_type, but looks good other than that!

vsoch commented 5 years ago

fixed.

vsoch commented 5 years ago

hey @Koeng101 we still need to verify the types! :)

vsoch commented 5 years ago

Is this complete and correct?

    COMPOSITE_TYPE = [
        ('base_part', 'base_part'),
    ]
Koeng101 commented 5 years ago

Hmmm, if we could actually have base_part_plasmid instead of base_part, that is all I need to get started.

vsoch commented 5 years ago

Yep I can definitely change base_part to base_part_plasmid, just did it in the code i'm working on locally so I'll close this issue! boum.