slaclab / lc2-hdf5-110

Investigate hdf5 1.10 features like SWMR and virtual dataset for LCLS II
Apache License 2.0
0 stars 2 forks source link

SWMR + round robin VDS, simplest way to wait for valid entries? #12

Open davidslac opened 7 years ago

davidslac commented 7 years ago

Let's say we are using SWMR on a virtual dataset that is a round robin of 2 other datasets from 2 different files. So we have:

fileA/dset
fileB/dset

for the source datasets, and

master/dset

we'll assume even entries in master/dset are from fileA, and odd from fileB.

now there is a program reading master/dset. How does it know what is valid in master/dset? I have been having it call H5Drefresh and H5LDget_dset_dims, but it just failed a integrity check. I think what is happening is that it read the fill value. If fileA is at entry 200 but fileB is only entry 100, those 200 entries for fileA will make master/dset look like it is 400 long. Now if the reader goes for odd entry 399, it won't be there since fileB is only at 100, the reader will get the fill value.

So the question is what is the right programming model for SWMR + round robin VDS for streaming data? Related, should there be a new feature in Hdf5 to support a more convenient model? That is should H5Drefresh and H5LDget_dset_dims on a VDS round robin return a length based on the minimum of the sources?

It would be nice to insulate the reader from details of the VDS, like all the separate source files, but maybe this is not possible. Right now it looks like someone, either the reader or the program making the master, has to keep an eye on the length of the indivual source files. The master could communicate it in a add-hoc way - ie, maintain a separate dataset with how far out the VDS has been filled out.

davidslac commented 7 years ago

A reader can open datasets with a access property list that says first missing - in this way it will only get complete entries. The details are to do something like

  NONNEG( H5Pset_virtual_view( access_id, H5D_VDS_FIRST_MISSING) );
  hid_t dset_id = NONNEG(H5Dopen2(parent, name, access_id));

still trying to get this to work