ilastik / ilastik4ij

ImageJ plugins to run ilastik workflows
MIT License
22 stars 17 forks source link

Import/export improvements #11

Closed wolny closed 5 years ago

wolny commented 5 years ago

This PR tackles the following issues:

Performance issues: An issue has been discovered which boils down to the current very inefficient HDF5 export code. Here are sample steps one can take to show the problem:

  1. Generate some random 3D H5 of size ~100MB in python e.g. uint16 of shape (200, 500, 500) zyx

  2. Import the dataset using our plugin in Fiji: Plugins > ilastik > Import HDF5. On my machine it takes 2.8 sec:

    [INFO] Found dataset '/raw' of type 'uint16'
    [INFO] Constructing output image of shape (512, 512, 1, 209, 1). Axis order: 'XYCZT'
    [INFO] Loading HDF5 dataset took: 2819
    [INFO] Done loading HDF5 file!

    ... yeah the axis order XYCZT is a bit weird but lets ignore it for now.

  3. Export just loaded dataset with our plugin in Fiji: Plugins > ilastik > Import HDF5. It will export it as 5D stack with the axis order tzyxc were t and c will be singleton dimensions.

  4. Now import the tzyxc dataset you've just loaded with our plugin. now it takes 18.2 sec! More than 6x slower. Just to make sure that this performance degradation was not cause by the axis expansion 3D -> 5D one may simply expand the original 3D dataset in python and save it into a new '5D test' dataset, e.g.

    with h5py.File(h5_file, 'r+') as f:
     dset = f['data'][...]
     dset = np.expand_dims(dset, 0)
     dset = np.expand_dims(dset, -1)
     f.create_dataset('test', data=dset)

    Now when you read this with our plugin it takes ~2.8 sec as it should. tl;dr We're writing the H5 file in a really inefficient way and we need to correct that!

Update I've completely rewritten the Hdf5DataSetWriter using the JHDF5 API (http://svnsis.ethz.ch/doc/hdf5/hdf5-14.12/) which is a bit higher level and one does not have to deal with all the low level nuances of ncsa.hdf.hdf5lib. Here are the results for the same test 3D dataset described above:

OLD API:

NEW API:

Write speedup: 18x :tada: