lhnguyen102 / cuTAGI

CUDA implementation of Tractable Approximate Gaussian Inference
MIT License
24 stars 9 forks source link

Kernel Optimization for Conv2d and Batchnorm2d #70

Closed lhnguyen102 closed 3 weeks ago

lhnguyen102 commented 3 weeks ago

Description 🚀

The goal is to optimize the forward and backward GPU kernels for Conv2d and BatchNorm2d by leveraging on-chip memory and sum reduction.

Changes Made

Note for Reviewer(s)

lhnguyen102 commented 3 weeks ago

@jamesgoulet Thanks for testing. It seems quite slow on Quadro rtx then :(

jamesgoulet commented 3 weeks ago

@jamesgoulet Thanks for testing. It seems quite slow on Quadro rtx then :(

PyTorch will also be slower on it 🤙