Vectorized operations - Githubissues

arteymix commented 8 years ago

I have a work-in-progress implementation of vectorized operations on basic GLib types and possibly extended types as part of Numeric-GLib.

Here's some testcases for adding 4 gint simultaneously: https://github.com/arteymix/numeric-glib/blob/c03fc0ad99868c43371d17929cd615a8193bb827/tests/numeric-test.vala#L61

Maybe it wouldn't be a bad idea to use Numeric-GLib as a subproject to perform vectorized operations on basic types and move as much operations as possible there to have a generous general purpose numerical library.

We might eventually try to work with OpenCL for that, but I think that using GCC Vector Extensions would be a good start.

rainwoodman commented 8 years ago

We shall give these vectorized operations a name -- because in the context of arrays we also have 'vectorize' -- which has a different meaning from this. I propose we call them vectorized-instructions, comparing to vectorized-functions in the context of array.
Is there any benchmark done for pack+vector-instr vs non-vector-instr operation of various instructions?
Depending on instructions on a particular CPU has a dangerous smell; putting that aside I'd prefer GCC extensions than OpenCL, because I have the impression that GCC API seems to be more stable than OpenCL. (I may be wrong.)

If the CPU doesn't support the vector-instructions, we need to provide a fallback implementation in numeric-glib: the library shall serve as an insulation layer.

The result of 1 will give us hint on how to properly make use of the vector operators with array iterators. Array iteration is sort of expensive already (keeping track of multi dimension indices), so in the end the cost of packing may be negligible, thus we can always use the vectorized-instruction functions in numeric-glib.
This brings up what length of vectors numeric-glib is guarenteed to support. e.g. always support length 4, regardless of size of the element?

arteymix commented 8 years ago

The way I'm designing Numeric-GLib is to provide all the types and vectorized types unconditionally with platform-dependant fallback. This will need some work though, but it will be easy to plug some conditional preprocessor directives there.

I don't see why we would need anything beyond 16 bytes for now because that's what SSE instructions work on. It will provide all the size that are power of 2 that do not result in a vector of one element.

GCC extensions are less intrusive than OpenCL. We can use them as-is whereas we would need to use specific code to work with GPUs. The latter can wait until the array implementation get somewhat stable.

I did not check any benchmark so far, but it should be significantly faster.

arteymix commented 8 years ago

I've read a bit and it seems that OpenCL is not really applicable in our case, at least not now. It appears that in most case, it's not worth copying the memory to the device unless you have a significant computation to perform.

Once we will have computation graph working, then it will be interesting to generate GPU code to evaluate that and provide primitives to map the dense array into GPU memory.

rainwoodman / vast

Vectorized operations #6