omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
311 stars 31 forks source link

Make shared memory allocation robust for compilation throughout all CUDA/AMDGPU versions #98

Closed omlins closed 1 year ago

omlins commented 1 year ago

also adjust allocator unit tests for CUDA