mratsim / laser

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Apache License 2.0
277 stars 15 forks source link

Update for devel OpenMP #3

Closed mratsim closed 5 years ago

mratsim commented 6 years ago

Following the merging of nim-lang/Nim#9493 the OpenMP annotation string will need to be patched when compiled for 0.19.1/0.20 vs 0.19.

Additionally we can handle both forEach and reduceEach in a unique macro as nb_chunks: var int wouldn't need to be passed anymore.

forEach would only generate #pragma omp for instead of #pragma omp parallel for and rely on a previous #pragma omp parallel