stcorp / coda

The Common Data Access toolset
http://stcorp.github.io/coda/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
39 stars 17 forks source link

Native HDF5 backend #5

Open svniemeijer opened 8 years ago

svniemeijer commented 8 years ago

Instead of using the HDF5 library, create our own implementation to read HDF5 files. This will allow a much faster access to the data (not need to work with dataspaces, vlen APIs, etc.)

The tricky part will be dealing with compressed data, but this should be similar to how we currently handle this with the CDF backend for zipped data.

svniemeijer commented 7 years ago

Using partial array reads with the HDF5 backend currently imposes a penalty due to the use of H5Sselect_hyperslab(). This penalty can be considerable. There have been test cases where using a partial array read would actually be slower then just reading the full dataset (and then taking a slice from it in memory). Currently, a hyperslab selection may need to be much smaller than the overall size to actually see any performance gain.

An implementation of a native backend in CODA should also try to make sure that partial dataset reads will actually end up becoming faster than reading the full dataset (in most circumstances).