Closed brentp closed 5 years ago
Laser is still in research mode so plenty of things are implemented but not properly exposed in a high-level API.
To do a reduction you can do it as it's done for the sum reduction:
I will create min and max tomorrow, so that they are ready to use.
Alternatively, if you use a Tensor there are 4 ways to do parallel reduction in this example: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/examples/ex05_tensor_parallel_reduction.nim#L9-L95
Note that the underlying forEachStaged
macro doesn't require an Tensor exactly, just a type that exposes rank
, size
, shape
, strides
and unsafe_raw_data
as described here https://github.com/numforge/laser/tree/master/laser/strided_iteration#strided-parallel-iteration-for-tensors. So it works with seq if those are defined.
I've added reduce_min
and reduce_max
(and renamed sum_kernel
to reduce_sum
) in #39.
They only works for float32 at the moment but if needed it's easy to extend to other types.
thanks very much for your links and the new reduce_min stuff. I can get this to work from the laser src directory but if I move it elsewhere I get a long traceback ending with:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h:35:0,
from /home/brentp/.cache/nim/pmin_r/.nimble7pkgs7Laser-0.0.17laser7primitives7simd__math7reductions__sse3.nim.c:10:
/usr/lib/gcc/x86_64-linux-gnu/5/include/pmmintrin.h:68:1: error: inlining failed in call to always_inline ‘_mm_movehdup_ps’: target specific option mismatch
_mm_movehdup_ps (__m128 __X)
^
/home/brentp/.cache/nim/pmin_r/.nimble7pkgs7Laser-0.0.17laser7primitives7simd__math7reductions__sse3.nim.c:56:7: error: called from here
shuf = _mm_movehdup_ps(vec);
I can move the same file containing:
import
random, sequtils,
laser/primitives/reductions
proc main() =
let interval = -1f .. 1f
let size = 10_000_000
let buf = newSeqWith(size, rand(interval))
echo reduce_min(buf[0].unsafeAddr, buf.len)
main()
in and out of ~/src/laser and it works in the directory and does not without.
I am compiling with nim c -d:openmp -d:danger -d:fastmath -a -r pmin.nim
btw, this gives a nearly 5X speed improvement on my laptop on my example use-case so this will be a nice improvement!
That's unfortunately one of Nim limitations.
If you look into reductions_sse3 file it calls min_ps_sse3
https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/simd_math/reductions_sse3.nim#L59 which uses sse3 intrinsics from https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/private/sse3_utils.nim#L8-L18
On x86_64 the compiler can only assume SSE2 support and more advanced SIMD instructions require an explicit compiler flag.
As I want the library to have a fallback when no SSE3 is available I can't just {.passC:"-msse3".}
(though you can).
So the SSE3 flag is passed per-file (instead of globally) via an undocumented feature of nim.cfg: https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/nim.cfg#L32.
So you need to add yourfilename.always = "-msse3" if you use the primitive outside of laser.
Note that I don't pass define sse3_utils.always because min_ps_sse3
is inline and so not present in the sse3_utils C file.
Ultimately, @Araq said that he wants to provide a way to in .nim file to have per-file compilation flags which would be very helpful.
got it. thanks for the explanation.
hi, I wanted to try out laser. I have this code working:
do I need an omp_critical section for the final result? and/or any other problems? And here is my calling code from your examples/