omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
321 stars 37 forks source link

Include `@tturbo` as loop vectorisation possibility for the CPU backend #33

Closed luraess closed 2 months ago

luraess commented 3 years ago

Something to consider as alternative or supplement to the current Threads.@threads option. The @tturbo macro allows for threaded aux instruction exposed by the LoopVectorization package. See here https://github.com/luraess/parallel-gpu-workshop-JuliaCon21#parallel-cpu-implementation for an example. There may be some restrictions on handling if conditions inside the loop.

omlins commented 3 years ago

reopened as foreseen GPU optimizations should also make the usage of LoopVectorization feasible without or little approach divergence between CPU and GPU code generation

omlins commented 2 months ago

LoopVectorization's future is unsure; instead, code generation for Polyester has been enabled.