thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

series.toimages() #353

Closed boazmohar closed 8 years ago

boazmohar commented 8 years ago

It seems that there is now a difference between images.toseries() and series.toimages() default chunk size behavior. Shouldn't they be the same? @freeman-lab @jwittenbach

jwittenbach commented 8 years ago

@boazmohar series.toimages() doesn't actually use chunking.

The standard assumption is that the # of pixels per image is much larger than the number of time points. Chunking only really helps when the non-distributed dimension is so large that the number of groupings we need to do for the transpose is prohibitive. So chunking was really only meant to speed up the images-to-series conversion, but not vice versa. That said:

  1. Chunking couldn't hurt the series-to-images transformation
  2. The assumption that space is much larger than time need not always hold.

So it might be worth it to have Series.toimages go through Blocks in the same manner that Images.toseries does.

jwittenbach commented 8 years ago

Investigated this further with @boazmohar offline. images.toseries uses the intermediate Blocks object to determine an optimal block-size for the transpose. However series.toimages relies only on Bolt's default setting for chunk-size. For datasets where the # of time points is of the same order or greater than the number of pixels, this makes images.toseries both inefficient and results in terrible partitioning in the resulting Images object. Seems like the best solution is make series.toimages as much of a mirror operation to images.toseries as possible.