xcompact3d / x3d2

https://xcompact3d.github.io/x3d2
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Reorder kernels in CUDA backend fail with SZ=64 #51

Open semi-h opened 6 months ago

semi-h commented 6 months ago

This is due to us going above the maxiumum thread count per block when we set SZ=64. Some of the reorder kernels require a 2D thread with SZ*SZ threads. We need to fix these kernels so that when SZ=64 or above these kernels work on tiles of 32 by 32.