rayon-rs / rayon

Rayon: A data parallelism library for Rust
Apache License 2.0
10.98k stars 497 forks source link

Best way for strided reduce #219

Open botev opened 7 years ago

botev commented 7 years ago

What would be the best way to implement strided reductions. E.g. assume we have an array of 5k elements and we want a result of 100 elements, where each element i is equal to the sum of i + 100p elements for all possible p.

Also can rayon work in synchrony with Futures? E.g. since currently rayon parallel iterators halt, if with futures we launch more than one parallel iterator would that brake anything?

nikomatsakis commented 7 years ago

@botev it should be fine for futures to launch parallel iterators. They should all compose with relative ease. =)

Regarding strided reductions, can you elaborate a bit more on what you want?

botev commented 7 years ago

So consider that I have a matrix, which however is backed by a normal array, and let's assume it is column major. Now I want to calculate the sum of each row, not column, which means that the elements in the sum of a specific row are non consecutive addresses in the memory array. I was hoping there is some way of doing this easily with rayon.

cuviper commented 7 years ago

Perhaps use par_chunks to iterate over the columns, then reduce with vector addition.

botev commented 7 years ago

@cuviper is it possible to reuse somehow the splitting strategy from rayon's reduce for the splitting in to the chunks (e.g. the sum reduce to not go sequentially, but smth like ((a+b)+(c+d))+((e+f)+(g+h))?

cuviper commented 7 years ago

Yes, par_chunks does the actual splitting, then reduce will re-associate similar to what you wrote. In order to create the intermediate results, you may actually need to fold first into new vectors, then reduce those.

bluss commented 7 years ago

@botev ndarray-parallel can do this. You can use its array functionality without allocating or copying your data (you can make an ArrayView of it in place), shape it to (n, 100) and then the column sums is like the example that the docs already have for axis_iter: https://docs.rs/ndarray-parallel/0.2.0/ndarray_parallel/

bluss commented 7 years ago

However, the problem would benefit most from SIMD parallelism and since the data is contiguous in another axis than the axis is summed along, you'd use ndarray's fold_axis https://docs.rs/ndarray/0.8.0/ndarray/struct.ArrayBase.html#method.fold_axis instead. I don't have a thread parallel version of that. (By the way, the sum method will attempt to pick the better of map_axis or fold_axis depending on the array's layout and chosen axis to sum).