Currently experimental factors such as life stage, tissue, and mutant are stored as part of the Sample model in the database. However this means we can't load other datasets having different factors into this pipeline.
To address this, I changed the CSV loading. The first column is always taken as the sample name, and other columns are considered to be experimental factors. Each experimental factor is stored into its own Factor model, identified by a name field, a value field and the sample it's linked to.
class Factor(models.Model):
sample = models.ForeignKey(Sample, on_delete=models.CASCADE)
name = models.CharField(max_length=250, blank=False, null=False)
value = models.CharField(max_length=250, blank=False, null=False)
As an exception, the group column is left as part of the Sample model itself instead of being stored in the Factor model. This is because many existing codes in FlyMet retrieves the group information in a big loop, e.g. to determine which group the sample of a peak belongs too. if we have to do a join each time to get the group for a sample, it would be very slow. We could instead optimise those codes, but let's leave for later .. For now, I think it's easier to assume that there's this special factor called group that is always part of the Sample model.
class Sample(models.Model):
name = models.CharField(max_length=250, unique=True, blank=False)
group = models.CharField(max_length=250, blank=True, null=True)
For compatibility with existing codes of FlyMet, I've also added three special properties to Sample: life_stage, tissue and mutant. Existing codes can use this to get the sample's life_stage, tissue and mutant values, so they won't have to be changed. Eventually I hope to get rid of these properties once the codebase is made flexible enough to support other datasets, see issue #60
@property
def life_stage(self): # for flymet compatibility
return self.get_factor_value('life_stage')
@property
def tissue(self): # for flymet compatibility
return self.get_factor_value('tissue')
@property
def mutant(self): # for flymet compatibility
return self.get_factor_value('mutant')
I also added some helper methods to get the samples given factors, and the other way around too. These are used in several places now.
FInally as discussed on Slack, I got rid of the serialisers (in serializer.py), since they are not really used elsewhere. Instead I used standard django ORM to bulk-insert the Sample and PeakSample objects during database population. This seems faster.
Part of issue #53.
Currently experimental factors such as life stage, tissue, and mutant are stored as part of the
Sample
model in the database. However this means we can't load other datasets having different factors into this pipeline.To address this, I changed the CSV loading. The first column is always taken as the sample name, and other columns are considered to be experimental factors. Each experimental factor is stored into its own
Factor
model, identified by a name field, a value field and the sample it's linked to.As an exception, the
group
column is left as part of theSample
model itself instead of being stored in theFactor
model. This is because many existing codes in FlyMet retrieves the group information in a big loop, e.g. to determine which group the sample of a peak belongs too. if we have to do a join each time to get the group for a sample, it would be very slow. We could instead optimise those codes, but let's leave for later .. For now, I think it's easier to assume that there's this special factor called group that is always part of the Sample model.For compatibility with existing codes of FlyMet, I've also added three special properties to
Sample
:life_stage
,tissue
andmutant
. Existing codes can use this to get the sample's life_stage, tissue and mutant values, so they won't have to be changed. Eventually I hope to get rid of these properties once the codebase is made flexible enough to support other datasets, see issue #60I also added some helper methods to get the samples given factors, and the other way around too. These are used in several places now.
FInally as discussed on Slack, I got rid of the serialisers (in serializer.py), since they are not really used elsewhere. Instead I used standard django ORM to bulk-insert the Sample and PeakSample objects during database population. This seems faster.