openconnectome / FlashR--

Apache License 2.0
1 stars 1 forks source link

array-oriented programming #21

Closed zheng-da closed 8 years ago

zheng-da commented 8 years ago

I'm not sure if this is the right place for this discussion, but I would like to raise this issue.

Array-oriented programming is critical for R to achieve relatively good performance and for FlashR to scale to large datasets. It means that each operation has to run on the entire vector or matrix. If the algorithm is written to access each individual element in a vector or a matrix, there is nothing FlashR can do to parallelize it or improve I/O performance. Take reliability.R for example (I don't quite understand the implementation). It seems to me that the code accesses columns or even elements individually. I think there should be a solution to write it in an array-oriented fashion.

jovo commented 8 years ago

agreed!

On Wed, Nov 11, 2015 at 11:25 AM, Da Zheng notifications@github.com wrote:

I'm not sure if this is the right place for this discussion, but I would like to raise this issue.

Array-oriented programming is critical for R to achieve relatively good performance and for FlashR to scale to large datasets. It means that each operation has to run on the entire vector or matrix. If the algorithm is written to access each individual element in a vector or a matrix, there is nothing FlashR can do to parallelize it or improve I/O performance. Take reliability.R for example (I don't quite understand the implementation). It seems to me that the code accesses columns or even elements individually. I think there should be a solution to write it in an array-oriented fashion.

— Reply to this email directly or view it on GitHub https://github.com/openconnectome/FlashR/issues/21.

the glass is all full: half water, half air. neurodata.io