Batch Matrix Multiplication using CuBLAS

Hi @lebedov,

Thanks for your Great Work.

Actually, I am working on registering a Plugin for an Operator(Einsum) which is not currently supported in TensorRT. So, instead of implementing a CUDA Kernel, I want to use the CuBLAS Library for Batch Matrix Multiplication.

The Equations I want to implement is(from Einsum Operator): "ntg, ncg → nct" and " nct, ncp-> ntp"(for Batch Matrix Multiplication)

Info about Einsum op: https://github.com/onnx/onnx/blob/master/docs/Operators.md#Einsum I needed a guidance in using CuBLAS Library for Batched Matrix Multiplication for the above two Ops.

I am referring to the Available references(https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemmbatched), but I am not getting how to use it for the above Equations.

Can you please assist me for the same?

Thanks in Advance, Darshan C G

lebedov / scikit-cuda

Batch Matrix Multiplication using CuBLAS #313