Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
GNU General Public License v3.0
93
stars
10
forks
source link
add built-in matrix multiplication with sizes between 2x2 and 8192x8192 #27
batched 2x2 4x4 16x16 32x32 single 8k x 8k with sub-matrix partitioning to increase load balancing
N-levels of partitioning (4,16,64,256 sub matrices) or M-levels of batching