omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
311 stars 31 forks source link

AMDGPU v0.5.0 compat #100

Closed luraess closed 1 year ago

luraess commented 1 year ago

Since AMDGPU v0.5.0 gridsize represents the number of "workgroups" (or blocks in CUDA) and no longer "workitems workgroups" (or threads blocks in CUDA) as HIP is used for kernel launches instead of HSA.

This means that previous AMDGPU kernel launch param gridsize need to be adapted from gridsize = nx to gridsize = cld(nx, groupsize).

Also, queues and signals are now abstracted into streams.

luraess commented 1 year ago

Will be fixed in #107