qxcv / joint-regressor

Regressing joints for fun and profit
Apache License 2.0
2 stars 3 forks source link

Writing data is horribly slow #17

Closed qxcv closed 8 years ago

qxcv commented 8 years ago

It's not even get_stacks() that's slow any more, it's just writing to the HDF5 file. It seems to be taking much longer than it did before I had smaller subposes, so it could have something to do with all of the extra datasets which the HDF5 file needs to handle (maybe there's some shuffling going on?). The time to write also seems to increase rapidly with file size, which is worrying.

Some ideas for improving performance:

qxcv commented 8 years ago

This is fixed for now (and I didn't even have to resort to crazy parallel writing hacks!). The problem was entirely in the number of calls to HDF5 wrapper routines (especially h5info), so batching a whole lot data up at once and calling store3hdf6 once per batch made things much faster.

I expect that this could become a problem again in the future, as it seems that the HDF5 C library has a bug where continually re-opening a file makes access much slower, but at least now it will take far longer for that problem to manifest itself.