ispyb / ispyb-database-modeling

4 stars 3 forks source link

Pins with multiple samples #20

Open KarlLevik opened 6 years ago

KarlLevik commented 6 years ago

One of our beamlines is keen to start using multi-sample pins, i.e. pins with multiple samples on them. These samples can have different crystal and protein properties.

The question is then, how do we fit this into the existing database schema? We've discussed various options at Diamond, and have landed on the following proposal:

We add a new colunm to BLSample: multiSamplePosition smallint unsigned. In addition, we will populate the loopType varchar(45) column with a value to indicate it's a multi-sample pin. We then get data like the below in the BLSample table:

blSampleId containerId loopType code location multiSamplePosition
1 1 multi-pin aaa 1 1
2 1 multi-pin bbb 1 2
3 1 multi-pin ccc 1 3
4 1 multi-pin ddd 1 4

Does this make sense? Is there a better solution for this?

jlmuir commented 6 years ago

What does the multiSamplePosition field represent? Is it a position index like the index of a pin in a puck? So the multi-samples are at fixed locations relative to some reference mark or location on the pin, and you can refer to them by multiSamplePosition? I assume multiSamplePosition 1 would be closest to the pin base?

If the answer to the above questions is yes, then it seems reasonable to me. I would then have just a novice question: are the terms "location" and "position" part of a standard nomenclature? From the BLSample table, it looks like each sample has a location (i.e., location VARCHAR(45)) and a position (i.e., positionId INT(11)). I don't know what location is; is that a logical path to the sample (e.g., /<sample-changer-dewar>/<puck-position-in-dewar>/<pin-position-in-puck>)? I would think not because a sample just knows what puck it's in, right (i.e., containerId INT(10))? So, I'm probably missing what the sample location means. I think the position is a goniometer XYZ position. Just curious to understand, what was your rationale for choosing multiSamplePosition over, say, multiSampleLocation?

KarlLevik commented 6 years ago

What does the multiSamplePosition field represent? Is it a position index like the index of a pin in a puck? So the multi-samples are at fixed locations relative to some reference mark or location on the pin, and you can refer to them by multiSamplePosition?

Yes!

I assume multiSamplePosition 1 would be closest to the pin base?

I'm not sure we have discussed this, but yes, that sounds reasonable to me.

I don't know what location is; is that a logical path to the sample (e.g., /<sample-changer-dewar>/<puck-position-in-dewar>/<pin-position-in-puck>)?

While the location column is defined as a varchar, it really is an integer. (I suspect this was a mistake or perhaps a deliberate decision to support some long-forgotten use-case.) So location is an index for the BLSample within its Container.

Just curious to understand, what was your rationale for choosing multiSamplePosition over, say, multiSampleLocation?

You're right, this should be 'location' rather than 'position'. So we discussed this briefly again today, and, taking your thoughts into account, it was suggested to name the column subLocation to be more consistent and also make it more generic.

StephMonaco commented 6 years ago

Why not using the the blSubsample table that would be more powerful since one could store x,y,z, image of the sample identified from the crystallogenesis LIMS that could be used to place this particular sample in the beam by image recognition? Otherwise we already have a column called NumberOfPositions in the difrraction plan table attached to a BLsample. This describes how many crystals have to be searched in the mesh but in that case (more simplistic approach) we don't spcify what subsample is where. It is an other use case.

drnasmith commented 6 years ago

My understanding is that BLSubSample table refers to regions or locations within a sample. In this new use case each location on the pin contains a unique sample - potentially there would be no relationship between samples on the same pin.

Diffraction plan table is interesting but presumably the numberofpositions would still refer to a single sample through the diffractionplan_has_sample table?

As the BLSample currently has a location that refers to its physical index into the sample holder (location) we need some method to extend this to refer to a second index or location. Adding a new column (instead of intpreting the existing location field in different ways) was seen as less impact on the data acquisition side.

KarlLevik commented 6 years ago

So @delageniere and @antolinos are you OK with this (new column subLocation and use existing column loopType) based on the discussion at the meeting?

delageniere commented 6 years ago

Hi Karl, Following the discussion at the meeting, this is ok for me, having these 2 new fields : subLocation and loopType. And I agree also to set the index 1 to the position closest to the pin base.

KarlLevik commented 6 years ago

Pull request: https://github.com/ispyb/ISPyB/pull/237

KarlLevik commented 6 years ago

I assume multiSamplePosition 1 would be closest to the pin base?

It turns out the way we're using this here, 1 is the closest to the tip of the pin!

rhfogh commented 6 years ago

I have tried talking to some people at IBS and EMBL-Grenoble about the new model and how it would fit their use cases (and to help me understand what is happening). Could you check whether I have got it right?

In terms of objects (which is how I think), I see four:

The new multiSamplePosition field is in addition to the BlSubSample (not a replacement for it), so you you can have multiple BlSubSamples within a given BlSample e.g. within a drop.

The BlSample table is denormalised, since a given SampleHolder ((containerId, location) combination) goes uniquely with a single loopType.

Is this correct (and are my 'object names' sensible)?

KarlLevik commented 6 years ago

@rhfogh I think that sounds mostly correct. Yes, there is a bit of de-normalisation since information about the sample and subsample are effectively stored in the same table, but this is only because we're trying to avoid creating a new table when doing so doesn't necessarily give us any advantages.