slaclab / lc2-hdf5-110

Investigate hdf5 1.10 features like SWMR and virtual dataset for LCLS II
Apache License 2.0
0 stars 2 forks source link

refresh for swmr is slow #9

Open davidslac opened 7 years ago

davidslac commented 7 years ago

I implemented a reader that reads from the master, going through external datasets and VDS's, it was really slow. The slowness if from calling H5Drefresh. The reader must call this per dataset, and we'd like to be able to have 1000 datasets for all the machine data.

I pulled it out in a small example

https://github.com/slaclab/lc2-hdf5-110/tree/master/questions/refresh

and still still 2500 refreshes per second, but for really easy datasets (no external, no VDS's, only one chunk, i.e, minimal metadata). That means if you are polling the data through SWMR every seconds, for 1000 datasets, 1/2 your time is refreshing your view.

The discussion with Hdf5 has led to a possible new feature for 1.10.1, from email - about a cache snapshot:

Right now, cache images are only written at file close and read in at file open, but I think you have a very nice usecase for bringing a variation of the idea to incremental flush->refresh cycles (at the file level, not the dataset though). That would allow a writer to make a snapshot of its metadata cache in the file with one I/O and a reader could ingest the entire set of changes in only one I/O operation also.