ispyb / ispyb-database-modeling

4 stars 3 forks source link

Extend ISPyB for cryo-ET and electron diffraction #73

Open KarlLevik opened 2 years ago

KarlLevik commented 2 years ago

We have designed a cryo-electron tomography data model for ISPyB in collaboration with a team at CNB-CSIC. This work was milestone MS32 in WP3 for iNext Discovery: Extend ISPyB for cryo-ET and electron diffraction.

We would like this to become part of the official ISPyB database schema.

2022-06-16: The below SQL has been updated with some of the changes suggested by @bodinm and @stufisher.

ALTER TABLE Movie
  ADD COLUMN IF NOT EXISTS angle float COMMENT 'unit: degrees relative to perpendicular to beam',
  ADD COLUMN IF NOT EXISTS fluence float COMMENT 'accumulated electron fluence from start to end of acquisition of this movie (commonly, but incorrectly, referred to as ‘dose’)',
  ADD COLUMN IF NOT EXISTS numberOfFrames int(11) unsigned COMMENT 'number of frames per movie. This should be equivalent to the number of MotionCorrectionDrift entries, but the latter is a property of data analysis, whereas the number of frames is an intrinsic property of acquisition.';

CREATE TABLE Tomogram (
  tomogramId int(11) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  dataCollectionId int(11) unsigned COMMENT 'FK to DataCollection table',
  autoProcProgramId int(10) unsigned COMMENT 'FK, gives processing times/status and software information',
  volumeFile varchar(255) COMMENT '.mrc file representing the reconstructed tomogram volume',
  stackFile varchar(255) NULL DEFAULT NULL COMMENT '.mrc file containing the motion corrected images ordered by angle used as input for the reconstruction',
  sizeX int(11) unsigned COMMENT 'unit: pixels',
  sizeY int(11) unsigned COMMENT 'unit: pixels',
  sizeZ int(11) unsigned COMMENT 'unit: pixels',
  pixelSpacing float COMMENT 'Angstrom/pixel conversion factor',
  residualErrorMean float COMMENT 'Alignment error, unit: nm',
  residualErrorSD float COMMENT 'Standard deviation of the alignment error, unit: nm',
  xAxisCorrection float COMMENT 'X axis angle (etomo), unit: degrees',
  tiltAngleOffset float COMMENT 'tilt Axis offset (etomo), unit: degrees',
  zShift float COMMENT 'shift to center volumen in Z (etomo)',
  CONSTRAINT FOREIGN KEY Tomogram_fk_dataCollectionId (dataCollectionId) REFERENCES DataCollection (dataCollectionId) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT FOREIGN KEY Tomogram_fk_autoProcProgramId (autoProcProgramId) REFERENCES AutoProcProgram (autoProcProgramId) ON DELETE SET NULL ON UPDATE CASCADE
)
COMMENT 'For storing per-sample, per-position data analysis results (reconstruction)';

CREATE TABLE TiltImageAlignment (
  movieId int(11) unsigned NOT NULL COMMENT 'FK to Movie table',
  tomogramId int(11) unsigned NOT NULL COMMENT 'FK to Tomogram table; tuple (movieID, tomogramID) is unique',
  defocusU float COMMENT 'unit: Angstroms',
  defocusV float COMMENT 'unit: Angstroms',
  psdFile varchar(255),
  resolution float COMMENT 'unit: Angstroms',
  fitQuality float,
  refinedMagnification float NULL DEFAULT NULL COMMENT 'unitless',
  refinedTiltAngle float COMMENT 'units: degrees',
  refinedTiltAxis float COMMENT 'units: degrees',
  residualError float COMMENT 'Residual error, unit: nm',
  PRIMARY KEY (movieId, tomogramId),
  CONSTRAINT FOREIGN KEY TiltImageAlignment_fk_movieId (movieId) REFERENCES Movie (movieId) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT FOREIGN KEY TiltImageAlignment_fk_tomogramId (tomogramId) REFERENCES Tomogram (tomogramId) ON DELETE CASCADE ON UPDATE CASCADE
)
COMMENT 'For storing per-movie analysis results (reconstruction)';

ALTER TABLE BeamLineSetup
  ADD amplitudeContrast float COMMENT 'Needed for cryo-ET';
bodinm commented 2 years ago

[minor] In the Tomogram table, I would prefer use tomogramId instead of tomogramID, to follow the existing DB naming convention.

stufisher commented 2 years ago

numberofframes is already in datacollection as number of passes (see https://user-images.githubusercontent.com/4463399/29261785-a6daf4f8-80d1-11e7-9671-9e15fa070d54.png [https://github.com/ispyb/ispyb-database-modeling/issues/14])

stufisher commented 2 years ago

Re naming

XaxisCorrection -> xAxisCorrection Zshift -> zShift

for consistency?

Can we get some background on how this works?

KarlLevik commented 2 years ago

Thank you both for the feedback! That all sounds good.

Background here: https://hackmd.io/RCfRD-FaTCagTxeTwVHIaA?view

The text from the above HackMD document was used in the aforementioned iNext Discovery report.

KarlLevik commented 2 years ago

@stufisher Just checking, are we sure that DC.numberOfPasses and the proposed Movie.numberOfFrames are equivalent? I'm just thinking that there could be multiple Movies in a DC (unless this relationship is meant to be 1 to 1?), each with a different number of frames?

stufisher commented 2 years ago

I think we had agreed DataCollection -> Movie (many) numberOfFrames -> no of movies numberOfPasses -> no of frames per movie

but yes you are right, this assumes that each movie has the same number of frames (i think this is the case, but can they be different?)

who knows what was actually implemented though! maybe have a quick browse of the production db?

KarlLevik commented 2 years ago

Data in our prod database definitely confirms that there are multiple Movies per DC. (One random DC I looked at had 14600 Movies!)

Apparently, we stopped populating DC.numberOfPasses for cryoEM in 2019.

Anecdotally, looking at the number of MotionCorrectionDrift rows per Movie they do seem to be the same within a DataCollection.

olofsvensson commented 2 years ago

@KarlLevik, I got feedback from our CryoEM scientist and they are happy with what you suggest. So for me it's ok to go ahead and implement your suggestion.

@stufisher , I had a look on our production DB and this is what we use: DataCollection -> Movie (many) DataCollection.numberOfImages: no of frames per movie

We don't store no of movies, it's simply the number of movie entries for a given data collection id.