omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
322 stars 38 forks source link

Make kernel launch parameter computation performance-negligable (also for small problems) #39

Closed omlins closed 3 years ago

omlins commented 3 years ago

@luraess recently observed that for small problems the computation of kernel launch parameters (when using @parallel <kernel-to-launch>) was not performance-negligable. This PR fixes this performance issue.