root-project / rootbench

Collection of benchmarks and performance monitoring applications
GNU Lesser General Public License v2.1
20 stars 41 forks source link

Import header-only library to read/write simple HEP data in HDF5 #251

Closed jalopezg-git closed 1 year ago

jalopezg-git commented 2 years ago

This pull request integrates into rootbench a header-only library to read/write simple HEP data in HDF5.
Having this at hand here would allow comparison benchmarks against HDF5 to be added in the future.

The library allows the storage of simple HEP data in HDF5, hiding away the details of data representation. Currently, it supports two different column models:

THIS CODE IS FOR BENCHMARK PURPOSES ONLY (ORIGINALLY FOR ACAT 2021); DO NOT USE IN PRODUCTION.

This is the state after ACAT 2021; however, in this or a follow-up PR we need to:

Example

The Makefile in the examples/ directory generates two versions of simple_struct.cxx: simple_struct_compound and simple_struct_fnal. Both of them will generate the same output; however, the internal HDF5 representation is basically different, as shown below.

For more complex examples, see gen_lhcb_h5.cc, gen_cms_h5.cc, lhcb_h5.cc, and cms_10br_h5.cc (not in this PR).

$ # For ./simple_struct_compound
$ h5dump simple_struct.h5
HDF5 "simple_struct.h5" {
GROUP "/" {
   DATASET "Foo" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_I32LE "i";
         H5T_IEEE_F32LE "f";
         H5T_VLEN { H5T_COMPOUND {
            H5T_IEEE_F32LE "f1";
            H5T_IEEE_F32LE "f2";
         }} "c";
      }
      DATASPACE  SIMPLE { ( 4 ) / ( H5S_UNLIMITED ) }
      DATA {
      (0): {
            0,
            1.2345,
            ({
                  1.1,
                  1.2
               }, {
                  2.1,
                  2.2
               }, {
                  3.1,
                  3.2
               })
         },
...
$ # For ./simple_struct_fnal
$ h5dump simple_struct.h5
HDF5 "simple_struct.h5" {
GROUP "/" {
   ATTRIBUTE "$Metadata" {
      DATATYPE  H5T_STD_U64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 4, 4
      }
   }
   GROUP "c" {
      DATASET "Event ID" {
         DATATYPE  H5T_STD_U64LE
         DATASPACE  SIMPLE { ( 12 ) / ( H5S_UNLIMITED ) }
         DATA {
         (0): 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3
         }
      }
      DATASET "f1" {
         DATATYPE  H5T_IEEE_F32LE
         DATASPACE  SIMPLE { ( 12 ) / ( H5S_UNLIMITED ) }
         DATA {
         (0): 1.1, 2.1, 3.1, 1.1, 2.1, 3.1, 1.1, 2.1, 3.1, 1.1, 2.1, 3.1
         }
      }
      DATASET "f2" {
         DATATYPE  H5T_IEEE_F32LE
         DATASPACE  SIMPLE { ( 12 ) / ( H5S_UNLIMITED ) }
         DATA {
         (0): 1.2, 2.2, 3.2, 1.2, 2.2, 3.2, 1.2, 2.2, 3.2, 1.2, 2.2, 3.2
         }
      }
   }
   DATASET "f" {
      DATATYPE  H5T_IEEE_F32LE
      DATASPACE  SIMPLE { ( 4 ) / ( H5S_UNLIMITED ) }
      DATA {
      (0): 1.2345, 2.2345, 3.2345, 4.2345
      }
   }
   DATASET "i" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 4 ) / ( H5S_UNLIMITED ) }
      DATA {
      (0): 0, 1, 2, 3
      }
   }
}
}
jalopezg-git commented 2 years ago

I'm wondering if we are committed to run 3rd party benchmarks regularly in rootbench. Please let me know if I missed a discussion on this topic. Otherwise, I suggest to reopen the PR on the iotools repository.

Probably those would not run regularly. @Axel-Naumann suggested before ACAT that this might be the proper place to merge this; I believe he was not thinking of running these benchmarks regularly, but only to preserve this code in the long term. I don't have an strong opinion on that, though.