This project is a pure Java implementation for accessing HDF5 files. It is written from the file format specification and is not using any HDF Group code, it is not a wrapper around the C libraries. The file format specification is available from the HDF Group here. More information on the format is available on Wikipedia. I presented a webinar about jHDF for the HDF Group which is available on YouTube the example code used and slides can be found here.
The intention is to make a clean Java API to access HDF5 data. Currently, reading is very well-supported and writing supports limited use cases. For progress see the change log. Java 8, 11, 17 and 21 are officially supported.
Here is an example of reading a dataset with jHDF
(see ReadDataset.java)
try (HdfFile hdfFile = new HdfFile(Paths.get("/path/to/file.hdf5"))) {
Dataset dataset = hdfFile.getDatasetByPath("/path/to/dataset");
// data will be a Java array with the dimensions of the HDF5 dataset
Object data = dataset.getData();
}
For an example of traversing the tree inside a HDF5 file see PrintTree.java.
An example of writing a file jhdf.hdf5
containing a group group
, with two datasets ints
and doubles
try (WritableHdfFile hdfFile = HdfFile.write(Paths.get("jhdf.hdf5"))) {
WritableGroup group = hdfFile.putGroup("group");
group.putDataset("ints", new int[] {1, 2, 3, 4});
group.putDataset("doubles", new double[] {1.0, 2.0, 3.0, 4.0});
}
See WriteHdf5.java for a more extensive complete example. Note: writing files is still a early feature with many more functions to be added.
For more examples see package io.jhdf.examples
ByteBuffer
s to allow for custom reading logic, or integration with other libraries.MappedByteBuffer
s which should provide fast file access. In addition, when accessing chunked datasets the library is parallelized to take advantage of modern CPUs. jHDF
will also allow parallel reading of multiple datasets or multiple files. I have seen cases where jHDF
is significantly faster than the C libraries, but as with all performance issues, it is case specific, so you will need to do your own tests on the cases you care about. If you do run tests please post the results so everyone can benefit, here are some results I am aware of:
jHDF
does not yet support a feature you need. If this is the case you should receive a UnsupportedHdfException
, open an issue and support can be added. For scheduling, the features which will allow the most files to be read/written are prioritized. If you really want to use a new feature feel free to work on it and open a PR, any help is much appreciated.Integer.MAX_VALUE
elements). This issue would also be addressed by slicing.Mostly it's a challenge, HDF5 is a fairly complex file format with lots of flexibility, writing a library to access it is interesting. Also, as a widely used file format for storing scientific, engineering, and commercial data, it would seem like a good idea to be able to access HDF5 files with more than one library. In particular JVM languages are among the most widely used so having a native HDF5 implementation seems useful.
jhdf
directory run ./gradlew build
(./gradlew.bat build
on Windows) this will run the build and tests fetching dependencies.jhdf
into your IDE../gradlew check
to run the build and tests.To see other available Gradle tasks run ./gradlew tasks
If you have read this far please consider staring this repo. If you are using jHDF in a commercial product please consider making a donation. Thanks!