Meta Issue: Support for parallelized/blocked algorithms

kernelmachine commented 8 years ago

What are your thoughts on implementing something similar to http://dask.pydata.org/en/latest/ on top of ndarrays? I suspect parallelized computations on submatrices should be pretty natural to do in the Rust framework, and it seems you've already created sub-array view functions. Do you agree?

(Community Edits below)

Actionable sub issues:

[x] Send/Sync splittable array views are already present
[x] Implement Rayon parallel iterator traits for Axis Iter #248 #252
[x] Implement Rayon parallel iterator traits for element iterators / for Array itself #252
[x] alternative methods of collecting an axis_iter to ndarray matrix #249
[x] Parallel Iter for AxisChunksIter
[x] Parallel support for Array::map_inplace #288
[x] Parallel support for Array::map -> Array
[x] Parallel lock step function application (Zip) #288

bluss commented 8 years ago

The goal is absolutely to be able to support a project like that. Iterators already provide chunking in inner_iter, outer_iter, axis_iter, axis_chunks_iter and their mut counterparts. We also want to add just a few more split_at like interfaces to support easy chunking like this.

bluss commented 8 years ago

Integrating with https://github.com/nikomatsakis/rayon would be pretty exciting too.

kernelmachine commented 8 years ago

Yup, that's exactly my thought! Would love to work on this, if you're interested in collaborating.

On Fri, Feb 26, 2016 at 4:24 PM bluss notifications@github.com wrote:

Integrating with https://github.com/nikomatsakis/rayon would be pretty exciting too.

— Reply to this email directly or view it on GitHub https://github.com/bluss/rust-ndarray/issues/89#issuecomment-189488084.

kernelmachine commented 8 years ago

Also on the subject of integrations, I've been writing a crate that wraps Lapack/BLAS with high level, easy to use functions, inspired by the hmatrix library in Haskell. Focus is on compile-time, descriptive error checking, enumerated matrix types, and an easy interface. I wrote my own (simple) matrix representation for the project, but it actually seems way better to build the crate on top of ndarray.

How actively are you working on the BLAS integration that I see on the docs? Would love to exchange notes.

bluss commented 8 years ago

Not very actively, but it's the thing I must solve now. Not sure if ndarray wants to continue with rblas or use more raw blas bindings.

One problem is specialization, i.e. how to dispatch to use BLAS for element types f32, f64 while still supporting other array element types. Rust will learn specialization down the line, but what it looks now, we can do some dispatch using Any instead. Which is fine, it just adds that Any bound.

bluss commented 8 years ago

Note that Any allows static (compile time) dispatch on the element type.

bluss commented 8 years ago

As a high level library, ndarray has that strain that comes from supporting a much more general data layout than what BLAS does. So we must always have both the optimized code and the fallback code present for everything.

bluss commented 8 years ago

More splitting coming up #94

kernelmachine commented 8 years ago

Awesome. I'll look into rayon integration via these split_at functions.

Yeah regarding the BLAS float issue, Any bound was my solution as well. In the initialization of the matrix I just tried to cast any ints to floats, else returned error.

bluss commented 8 years ago

Can you make this issue more concrete? Ndarray will not aim to develop or host a project that is similar to Dask, but we can make sure it can be built with ndarray.

More low level methods have been exposed since this issue was reported (See 0.4.2 release).

Maybe more concrete issues can be filed for missing functionality.

kernelmachine commented 8 years ago

Sure. I think this issue comes down to an integration between ndarray and rayon. We should be able to apply basic parallelized computations on an array of subviews, and aggregate/reduce. This interface could be generic, or we could focus on a few specialized computations, like elementwise-operations or selections.

bluss commented 8 years ago

Yeah.

Here's a very basic experiment with that (only elementwise ops)

https://github.com/bluss/rust-ndarray-experimental/blob/master/src/lib.rs

One important thing is of course to split along whichever axis has the greatest stride.
There was a significant discovery here related to the just merged unstable feature specialization. You can seamlessly special case for the thread safe vs. not thread safe case, and use rayon only when the operation is thread safe! (Sync / Send as appropriate)

bluss commented 7 years ago

We need to break this down into specific sub-issues to that we can get each piece done in turn.

I'm editing the first comment of this issue. This is a good thing, that means that both I and you @pegasos1 can edit the same task list.

kernelmachine commented 7 years ago

We just need to implement the parallel iterator trait, right? beyond tests and stuff, what else is there?

bluss commented 7 years ago

parallel map is a bit tricky (the Array::map(f) -> Array), but I have a work in progress for that.

bluss commented 7 years ago

There's also the question of interface. You have championed the parallel wrapper for array types before I think.

With parallel wrappers it could be something like:

use ndarray::parallel::par;
par(&mut array).map_inplace(|x| *x = x.exp());

or parallel array view types

array.par_view_mut().map_inplace(|x| *x = exp());

We could use wrapper/builder types for the closure instead:

use ndarray::parallel::par;
array.map_inplace(par(|x| *x = x.exp()));

or separate methods:

array.par_map_inplace(|x| *x = x.exp());

What is possible with specialization is to transparently parallelize regular Array::map_inplace calls, but that is too magical, we don't want that I think.

iduartgomez commented 7 years ago

On a more general note, there are any plans to eventually provide opt-in GPU computation? Maybe using https://github.com/arrayfire/arrayfire-rust ?

bluss commented 7 years ago

There is no explicit plan one way or the other.

Ndarray's design (explicit views, direct access to data) dictates that it's an in-memory data structure, so it could only integrate with gpu computation by allowing conversion to a more restricted format (like Arrayfire), or implementing operations using such a conversion before and after.

frjnn commented 3 years ago

Parallel Iter for AxisChunksIter

and

Parallel support for Array::map -> Array

should be checked out. @bluss @jturner314

bluss commented 3 years ago

I guess everything here is done as of current master. Zip::par_map_collect could be sufficient to satisfy the Array::map item, do you agree @frjnn?

frjnn commented 3 years ago

I agree

bluss commented 3 years ago

All the actionable points have been completed, so we can celebrate by closing. However, I think there is a lot more to do if we should begin to approach the original appeal of the issue text, and a new issue is welcome for that. :slightly_smiling_face:

rust-ndarray / ndarray

Meta Issue: Support for parallelized/blocked algorithms #89