The current way of doing element-wise access of matrix elements can have some serious impact on performance, especially on GPU. It launches a gazillion of tiny, useless kernels.
[ ] Rewrite element-wise access to be vectorized (slicing & fancy indexing).
[ ] Think about how to write general GPU/CPU kernels for index resolution.
The current way of doing element-wise access of matrix elements can have some serious impact on performance, especially on GPU. It launches a gazillion of tiny, useless kernels.