Allow save_result_array(s) methods to use h5 dataset compression filters

Original report (archived issue) by David Meyer (Bitbucket: dihm, GitHub: dihm).

We've recently had an experiment get up to speed and start producing a prodigious amount of data. In researching how to deal with it, I stumbled on to hdf5 dataset compression. It gives a performance hit to read/writes for moderate size reductions while being essentially transparent in use, so long as the compression is specified when the dataset is first created.

It would be nice if large, summary datasets produced in lyse could be stored compressed from the outset. Because of performance considerations, we don't want it to be default, so it should be configurable on individual data saves as well.

My initial thought was to simply pass through kwargs to the create_dataset function in both save_result_arrays and save_result_array. Thoughts?

#!diff

@@ -209,7 +209,8 @@
                 _updated_data[self.h5_path] = {}
             _updated_data[self.h5_path][str(self.group), name] = value

-    def save_result_array(self, name, data, group=None, overwrite=True, keep_attrs=False):
+    def save_result_array(self, name, data, group=None, 
+                          overwrite=True, keep_attrs=False, **compress_args):
         if self.no_write:
             raise Exception('This run is read-only. '
                             'You can\'t save results to runs through a '
@@ -233,7 +234,7 @@
                 else:
                     raise Exception('Dataset %s exists. Use overwrite=True to overwrite.' % 
                                      group + '/' + name)
-            h5_file[group].create_dataset(name, data=data)
+            h5_file[group].create_dataset(name, data=data, **compress_args)
             for key, val in attrs.items():
                 h5_file[group][name].attrs[key] = val

@@ -264,7 +265,7 @@
                 self.save_result(name, value[0], **kwargs)
                 self.save_result('u_' + name, value[1], **kwargs)

-    def save_result_arrays(self, *args):
+    def save_result_arrays(self, *args, **compress_args):
         names = args[::2]
         values = args[1::2]
         for name, value in zip(names, values):
@@ -268,7 +269,7 @@
         names = args[::2]
         values = args[1::2]
         for name, value in zip(names, values):
-            self.save_result_array(name, value)
+            self.save_result_array(name, value, **compress_args)

     def get_image(self,orientation,label,image):
         with h5py.File(self.h5_path) as h5_file:

labscript-suite-temp-2 / lyse

Allow save_result_array(s) methods to use h5 dataset compression filters #44