Open squaregoldfish opened 1 year ago
Possibly implement some concept of a "Job Sequence" for data processing jobs to run in one go.
The main sequence will start at Sensor QC and run through to Data Reduction QC. We pass a map of Objects between the jobs, and the following job knows what needs to be in the map for it to do its thing. If anything isn't there it will fail.
Note that jobs that are not the first entry in a sequence cannot be requeued.
Note that the sensor offsets restarts the calculations at Data Reduction, which will be in the middle of the standard job flow.
Be careful with SearchableSensorValuesList
- if it's made groups for grouped measurements, they will be come invalid if the QC changes.
Testing stats:
Original: Recalculate dataset: 2m 50s, 10.8Gb
DataSetSensorValues
operates in different modes (All, Ignore Flushing, Ignore Internal Calibrations). This will need to be build into the class because the different jobs need different modes.
1am thought: Maybe we just pass a list of all SensorValues between jobs, and build a new DatasetSensorValues object from it at the start of a new job which filters them? Not sure how well that will work with memory usage for all the data structures in a DataSetSensorValues
object. But it might not be too bad, since we won't be duplicating SensorValue
objects. Try it and see.
At the top of each job, see if the TreeSet<SensorValue>
exists in the transfer data, and if not load it. We may need a new method in DataSetDataDB
for this.
Since the jobs are typically I/O bound at this point, can we pass data from one job to the other in the chain? e.g. AutoQCJob and DataReductionJob both need SensorValues (It's 2:30am and I can't remember the exact requirements, but you get the idea.)
This might mean that data set processing needs to be finished in one go instead of jumping between them, to prevent a ton of RAM being allocated while it's switching between datasets if there's loads queued at once.