Open arteymix opened 8 years ago
If the CPU doesn't support the vector-instructions, we need to provide a fallback implementation in numeric-glib: the library shall serve as an insulation layer.
The way I'm designing Numeric-GLib is to provide all the types and vectorized types unconditionally with platform-dependant fallback. This will need some work though, but it will be easy to plug some conditional preprocessor directives there.
I don't see why we would need anything beyond 16
bytes for now because that's what SSE instructions work on. It will provide all the size that are power of 2
that do not result in a vector of one element.
GCC extensions are less intrusive than OpenCL. We can use them as-is whereas we would need to use specific code to work with GPUs. The latter can wait until the array implementation get somewhat stable.
I did not check any benchmark so far, but it should be significantly faster.
I've read a bit and it seems that OpenCL is not really applicable in our case, at least not now. It appears that in most case, it's not worth copying the memory to the device unless you have a significant computation to perform.
Once we will have computation graph working, then it will be interesting to generate GPU code to evaluate that and provide primitives to map the dense array into GPU memory.
I have a work-in-progress implementation of vectorized operations on basic GLib types and possibly extended types as part of Numeric-GLib.
Here's some testcases for adding 4
gint
simultaneously: https://github.com/arteymix/numeric-glib/blob/c03fc0ad99868c43371d17929cd615a8193bb827/tests/numeric-test.vala#L61Maybe it wouldn't be a bad idea to use Numeric-GLib as a subproject to perform vectorized operations on basic types and move as much operations as possible there to have a generous general purpose numerical library.
We might eventually try to work with OpenCL for that, but I think that using GCC Vector Extensions would be a good start.