pelahi / VELOCIraptor-STF

Galaxy/(sub)Halo finder for N-body simulations
MIT License
19 stars 26 forks source link

Building the SWIFT interface without HDF5 C++ #32

Closed jchelly closed 5 years ago

jchelly commented 5 years ago

Is your feature request related to a problem? Please describe.

When building the SWIFT interface we need to link to the same HDF5 installation that SWIFT is using. We need HDF5 with MPI support (for SWIFT) and the C++ interface (for Velociraptor). This combination is officially unsupported so it's not tested by the HDF5 developers and we usually wont be able to use pre-existing installations at HPC centres.

Describe the solution you'd like

We could make it possible to build the SWIFT interface without HDF5 C++ support. There's no need to modify the HDF5 snapshot reading routines because they could just be disabled if we're running within SWIFT. We would only need to modify the code that writes out the group information. Probably that could be done by making a small class or function for writing datasets via the C interface then using that instead of the C++ interface. At the moment there's a lot of duplication of HDF5 calls in io.c so it might be an improvement to readability too.

Describe alternatives you've considered

The SWIFT interface does build with HDF5 disabled entirely but then we're stuck with binary output.

MatthieuSchaller commented 5 years ago

If I may chip in, that would also enable to use the parallel-hdf5 library for i/o, which could considerably speed up the use of VR on SWIFT snapshots for post-processing.

pelahi commented 5 years ago

I think the option is to write a wrapper function for writing to HDF5 and that wrapper function uses the C API but is easy enough to alter and enable the C++ API.

jchelly commented 5 years ago

@pelahi - exactly. Looking at io.cxx there are a lot of hdf5 calls but they are almost all just writing an array to a new dataset, so if we have a function to do that then that the output side is mostly taken care of. Parallel input as @MatthieuSchaller suggests would be more work. Maybe that should be another feature request.

jchelly commented 5 years ago

I wrote a bit of code to do HDF5 output from HBT that might be useful for this. I'll take a look at adapting it.

rtobar commented 5 years ago

There's also the hdf5 input/output classes we use in shark (https://github.com/ICRAR/shark/tree/master/include/hdf5 and https://github.com/ICRAR/shark/tree/master/src/hdf5). They are built on top of the HDF5 C++ bindings, but they don't expose that detail to the caller, who only issues simple calls (e.g., https://github.com/ICRAR/shark/blob/master/src/galaxy_writer.cpp#L96-L161 and others in the same file).

MatthieuSchaller commented 5 years ago

@rtobar that would not directly solve the issue as far as I understand it as the bindings will still require a C++-enabled build of HDF5, which directly conflicts with the use of parallel-hdf5 in other parts of the code.

rtobar commented 5 years ago

Yes, sorry for the confusion. When I said that the shark classes don't "expose that detail" I was actually VERY implicitly implying (my bad!) that their internal usage of the hdf5 lib could be rewritten to use the C API instead, while maintaining the ease of use for callers. So yes, not fully "just plug and play", but I thought it would be useful to get some ideas.

jchelly commented 5 years ago

I've uploaded a branch with an incomplete attempt at the type of changes I had in mind: https://github.com/jchelly/VELOCIraptor-STF/tree/swift_hdf5_nocxx . This adds a H5OutputFile class with write_dataset and write_attribute methods implemented using the HDF5 C API and I've converted some of the calls in io.cxx.

pelahi commented 5 years ago

VR updated to use HDF C API.