Open davidslac opened 7 years ago
From correspondence with Barbara through help@hdfgroup.org:
I did hear back from the developer regarding SWMR and VDS...
He read the linked article, and says that for the "variable detector rates" scenario, VDS supports fill values for unmapped parts of the dataset, so you do not need to map unused spots to a NULL dataset.
The "messy" data implementation is similar to using a region reference dataset to point to elements in the source datasets, which will not perform well. There is more programming work that needs to be done to read through the reference dataset. However, while we could implement something to make this scheme more transparent, it will never perform well.
Unlimited VDS mappings can either be regularly sized blocks or a single extensible block.
VDS SWMR - Ok for Round Robin
It looks like the new hdf5 1.10 virtual dataset features works like this
We would like to use VDS and SWMR for DAQ data. With SWMR, you have to
For our LCLS II DAQ operation, I'm imagining these steps
Likewise, the master process is going to get a chance to take a look at the datasets in the RAW files, and then
(Note, I don't have a prototype of getting this to work yet, still chasing down bugs)
The issue I'm seeing is that, during live data taking, we don't know ahead of time what kind of view we want to setup, that is the mapping between the source datasets and the master dataset.
This is the kind of data that the current VDS/SWMR features look like they will support:
That is, we have two different detectors that are distributed across two files in a predictable, round robin fashion. Since we know how all future data will look, we can create a nice, time aligned master view that will look like
This will involve a relatively small number of Hdf5 function calls to map the entire, future, datasets of the RAW files to strided (stride=2) selections of the master virtual view.
Aligning Varying Detector Rates
We'd like to align datasets, that is we'd like
So far, this looks like it will work fine, but one challenge to aligning, is that detectors can fire at different rates. That is the RAW files may look like
meaning detectorA is recorded on every shot, but detectorB on every other shot.
We'd like to create an efficient master view by mapping all of [1,3,5,...] of detectorB in the master to a 'null' value somewhere, I think we would do this by either
Messy Data
The problem is messy live data, the DAQ will have to drop one or both detectors from a given shot, if we have data like:
the master process will want to update the VDS mapping while it is looking at the RAW files, that is ideally, we can create a nice view of data while we record it into the RAW files.
That is we have will want, for detectorA,
that is, we won't be following a stride pattern, and won't know what rows of the master virtual view are mapped to what rows of the file:0 file:1 datasets until we read them.
Dataset vs. Metadata VDS Implementation
Ultimately, I think we are looking for a 'dataset' implementation of the VDS - know it is a 'metadata' implementation - meaning you map everything out ahead of time, flush the 'meta-data' cache, and then it should work through SWMR.
You can imagine implementing a the messy VDS described above by writing a datasets in the master file with information like detectorA/time=3, use file 0, and row 1. Then an application the knows this schema can chase through to the correct location in file 0, file 1 etc. I think ideally, this would be a feature of Hdf5.