Closed jasperzhong closed 1 year ago
fuse RGCN多个relation kernel的工作.
Motivation: type-specific kernels are called separately, resulting in many small kernels. The goal is to improve GPU utilization by fusing type-specific kernels in RGCN/HGT models
Two baseline
The best choice of the two operators varies from datasets. There is no one-size-fits-all operator.
Kernel-level optimizations: shared memory for node embeddings, L2 cache for weight matrix, warp for vector-matrix multiplication, and accumulation in GPU registers
Experiments on small datasets (up to 5M edges) show up to 3x speedup for full-graph training and up to 2x for mini-batch.
https://assets.amazon.science/62/5c/ba110eab4fd88d34b2b3fb3b3bf9/optimizing-irregular-dense-operators-of-heterogeneous-gnn-models-on-gpu.pdf