jipolanco / PencilArrays.jl

Distributed Julia arrays using the MPI protocol

MIT License

60 stars 8 forks source link

For GPU arrays, transpositions and other operations are now performed completely on the GPU (as far as I can tell...), avoiding slow scalar indexing.

Well, for now this has just been tested with the reference implementation of GPUArrays.jl (JLArray), which is implemented on CPUs.

It would be nice to test things with CuArrays. For that, one just needs to add CuArray to the list of array types tested in test/array_types.jl. @corentin-dev let me know if you can try that out.

For now I have no idea how the transposition of GPU arrays actually performs, and it would be nice to have some benchmarks. There are still some things that can be improved. In particular, when using dimension permutations (enabled by default in PencilFFTs), there are some additional allocations that should be taken care of.

This PR closes #21 (but can be reopened if stuff is missing).

Codecov Report

Merging #40 (30474ed) into master (aaa806b) will increase coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #40      +/-   ##
==========================================
+ Coverage   97.15%   97.17%   +0.02%     
==========================================
  Files          17       18       +1     
  Lines         983     1026      +43     
==========================================
+ Hits          955      997      +42     
- Misses         28       29       +1

Impacted Files	Coverage Δ
src/Transpositions/Transpositions.jl	`98.09% <100.00%> (+0.30%)`	:arrow_up:
src/gather.jl	`100.00% <100.00%> (ø)`
src/random.jl	`100.00% <100.00%> (ø)`
src/arrays.jl	`95.14% <0.00%> (-0.98%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update aaa806b...30474ed. Read the comment docs.

jipolanco / PencilArrays.jl

Avoid scalar indexing with GPU arrays #40

Codecov Report