thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

Slow saving/loading of binary files #358

Open boazmohar opened 8 years ago

boazmohar commented 8 years ago

We are dealing with a long but small volumetric data set (~50k-100k time points, ~200Kb per time point). Currently saving this images object using tobinary() produces a large number of small files, which is slow to read, write and especially delete using our storage back-end. I suggest to add a parameter that would group n time points in order to reduce the number of files written to disk. A few points to think about are:

  1. What grouping to use: would a list work or we will need to stack the n dimensional data in the n+1 dimension.
  2. What would be the equivalent series implementation: a grouping factor for each axis?
  3. How to retrieve the original images or series object: change the conf.json file to include this parameter, have a similar parameter in frombinary(), maybe both.

@freeman-lab, @jwittenbach would like to hear what you think before starting to play around with this.