Make the codes more generic

joewandy commented 4 years ago

At the moment, the sample metadata stored in the database is specific to FlyMet. It assumes the following info are provided: lifestage, tissue, mutant. For other dataset, these info won't be available, and different types of metadata will be available. I would like to perform the following enhancement to the pipeline to make it more generic in order to be able to load other non-FlyMet dataset in.

In particular here are the codes that will be modified:

In the population script

def populate_samples(sample_csv):
    '''
    Give the sample CSV file to populate the samples.
    KMcL: Working but need to consider the filepath.
    '''
    sample_details = np.genfromtxt(sample_csv, delimiter=',', dtype=str)[2:]
    logger.debug("sd_type %s" % sample_details)
    for sample in sample_details:
        # sample = s.split()
        sample_serializer = SampleSerializer(
            data={"name": sample[0], "group": sample[1], "life_stage": sample[2], "tissue": sample[3],
                  "mutant": sample[4]})
        if sample_serializer.is_valid():
            db_sample = sample_serializer.save()
            logger.debug("sample saved %s" % db_sample.name)
        else:
            logger.debug(sample_serializer.errors)

In the model

class Sample(models.Model):
    """
    Model class defining an instance of an experimental Sample including the tissue and life-stage from which it came
    """
    # Here the sample name is unique as this is important for processing FlyMet data
    name = models.CharField(max_length=250, unique=True, blank=False)
    life_stage = models.CharField(max_length=250, blank=False)
    tissue = models.CharField(max_length=250)
    group = models.CharField(max_length=250, blank=True, null=True)
    mutant = models.CharField(max_length=250, blank=True, null=True)

    def  __str__(self):
        """
        Method to return a representation of the Sample
        """

        return "Sample " + self.name

and in the serialiser

class SampleSerializer(serializers.ModelSerializer):
    class Meta:
        model = Sample
        fields = ('name','life_stage', 'group','tissue','mutant')

joewandy commented 4 years ago

As discussed on slack, we will create another model called 'Factor'. This will be linked to the Sample model, and use it to store the group, life stage, tissue and mutant in the case of FlyMet. For other dataset, we can store different factors, like timepoints etc.

Alternatively we could store the factors in a JSON dictionary and put them into a column in the sample .. But this is not very relational.

joewandy commented 3 years ago

The PyMT data can now be successfully loaded through the pipeline. However many views are not working, due to the hardcoding in the codes (related to issue #60). Will work on that next.

kmcluskey / FlyMet

Make the codes more generic #53