thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

set _ordered = true the default for images.fromList() #338

Closed d-v-b closed 8 years ago

d-v-b commented 8 years ago

When creating an images object using the fromList method, the underlying bolt.array object appears to have _ordered == False, which results in any rdd operation on that array requiring a sortByKey, which isn't so great when the rdd is actually ordered in the first place. The default value for _ordered should probably be True in this case.

freeman-lab commented 8 years ago

@d-v-b great idea! Was just discussing with @jwittenbach that this alone might be responsible for a few of the performance regressions we've been seeing.

freeman-lab commented 8 years ago

@jwittenbach is doing some large-scale testing now to confirm

jwittenbach commented 8 years ago

Closed by #339

d-v-b commented 8 years ago

great, thanks!