In the CGO17 benchmarks MM NVIDIA and NBody NVIDIA, there are nested reductions with distributed private arrays. Currently the per-thread size of these arrays is bigger than it needs to be.
I assume that the bug lies in AdjustArraySizesForAllocations, but this is just a guess.
In the CGO17 benchmarks MM NVIDIA and NBody NVIDIA, there are nested reductions with distributed private arrays. Currently the per-thread size of these arrays is bigger than it needs to be.
I assume that the bug lies in
AdjustArraySizesForAllocations
, but this is just a guess.