This is an experiment / proof of concept to use Nim's metaprogramming to provide an easy to use SIMD abstraction layer. The goal is for users of the library to be able to write blocks of code containing SIMD intrinsics one time and end up with optimum or near optimum SIMD instructions being used at runtime according to the users hardware. If you are interested in this, you may be interested in my other Rust library, which is a more complete version of the same idea: SIMDeez
How this will work:
simd:
macroSee simd.nim for the proof of concept so far. Already you can write code like this successfully:
var
a = newSeq[float32](12)
b = newSeq[float32](12)
r = newSeq[float32](12)
for i,v in a:
a[i] = float32(i)
b[i] = 2.0'f32
SIMD:
for i in countup(0,<a.len,simd.width div 4):
let av = simd.loadu_ps(addr a[i])
let bv = simd.loadu_ps(addr b[i])
let rv = simd.add_ps(av,bv)
simd.storeu_ps(addr r[i],rv)
echo a
echo b
echo r
for i,v in myArray
and
it will convert to iterate over the array at the appropriate stride length. Perhaps doing the loads automatically. But I would like to give the user as much control as possible for performance.I would love help with this, I am slowly learning Nim meta programming, so anyone who can help with that aspect, or who can help with SIMD issues would be appreciated. Feel free to jump in.