ispyb / ispyb-database-modeling

4 stars 3 forks source link

Tables for processing jobs #19

Open KarlLevik opened 6 years ago

KarlLevik commented 6 years ago

We would like to have some tables to help keep track of processing jobs. These can be triggered by users of the ISPyB web application, or from other sources, including automatically when a data collection is happening.

Here's what the tables for this might look like:

CREATE TABLE ProcessingJob (
  processingJobId int(11) unsigned AUTO_INCREMENT PRIMARY KEY,
  dataCollectionId int(11) unsigned,
  displayName varchar(80) COMMENT 'xia2, fast_dp, dimple, etc',
  comments varchar(255) COMMENT 'For users to annotate the job and see the motivation for the job' ,
  recordTimestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'When job was submitted',
  recipe varchar(50) COMMENT 'What we want to run (xia, dimple, etc).',
  automatic boolean COMMENT 'Whether this processing job was triggered automatically or not',
  CONSTRAINT ProcessingJob_ibfk1 FOREIGN KEY (dataCollectionId) REFERENCES DataCollection(dataCollectionId)
) COMMENT 'From this we get both job times and lag times';

CREATE TABLE ProcessingJobParameter (
  processingJobParameterId int(11) unsigned AUTO_INCREMENT PRIMARY KEY,
  processingJobId int(11) unsigned,
  parameterKey varchar(80) COMMENT 'E.g. resolution, spacegroup, pipeline',
  parameterValue varchar(255),
  CONSTRAINT ProcessingJobParameter_ibfk1 FOREIGN KEY (processingJobId) REFERENCES ProcessingJob(processingJobId)
);

CREATE TABLE ProcessingJobImageSweep (
  processingJobImageSweepId int(11) unsigned AUTO_INCREMENT PRIMARY KEY,
  processingJobId int(11) unsigned,
  dataCollectionId int(11) unsigned,
  startImage mediumint unsigned,
  endImage mediumint unsigned,
  CONSTRAINT ProcessingJobImageSweep_ibfk1 FOREIGN KEY (processingJobId) REFERENCES ProcessingJob(processingJobId),
  CONSTRAINT ProcessingJobImageSweep_ibfk2 FOREIGN KEY (dataCollectionId) REFERENCES DataCollection(dataCollectionId)
) COMMENT 'This allows multiple sweeps per processing job for multi-xia2';

ALTER TABLE AutoProcProgram
   ADD processingJobId int(11) unsigned COMMENT 'Which processing job triggered this auto processing',
   ADD CONSTRAINT AutoProcProgram_FK2 FOREIGN KEY (processingJobId) REFERENCES ProcessingJob(processingJobId);

We had originally named these tables Reprocessing, but after some thought concluded ProcessingJob was better since these jobs could be re-processing as well as the initial processing. Also, the "Job" suffix makes it clearer what we're talking about.

antolinos commented 6 years ago

Hi @KarlLevik,

I think we should dedicate a whole session to discuss which is our vision about processing, reprocessing in ISPyB. It is a clear need of reprocessing data but not sure how it should be implemented taking into account that ISPyB is suppose to be multitechnique. For instance:

  1. ProcessingJob is attached to a single collection, what happens if we want to merge two data collections?
  2. Are ProcessingJobParameter input or output?
  3. Is ProcessingJobImageSweep like ProcessingJob but with two input parameters (startImage and endImage?)
stufisher commented 6 years ago

Sorry, we should provide a bit more scope for these:

  1. ProcessingJob.datacollection is the datacollection of which to attach the results of the reprocesing to.
  2. These are input parameters, i.e. i want to cut my data to 1.5A
  3. /Each ProcessingJob must have at least 1 ProcessingJobImageSweep, this tells the processing which images to use to process/. If it has multiple sweeps then you can integrate multiple crystals. Sweep is not required now, so these tables can be used for any kind of processing in any discipline
stufisher commented 6 years ago

ProcessingJobParameter is a key value table, so does not specify a single technique. These tables should be generic enough to allow them to be used in other techniques, and i have deliberately made them so they could be used in EM too, if you see my original EM Data Model. We use workflowid + reprocessingjobparameter