Why is your manual implementation via autograd.Function even faster than PyTorch 's autograd engine?

WarBean commented 5 years ago

In order to make my code clean and easy to read, I tried to reimplement covpool, sqrtm and triuvec with native PyTorch operators as a simple plain python function, as shown in https://github.com/jiangtaoxie/fast-MPN-COV/pull/7.

After ensuring the forward and backward results are equivalent between my auto backward version (with autograd engine) and your manual backward version (with autograd.Function), I tested their speed and surprisingly found my auto backward version slower.

Have you compared these two different approaches before? Do you have any idea on why the manual backward implementation is even faster than PyTorch 's autograd engine?

jiangtaoxie commented 5 years ago

@WarBean I also noticed this phenomenon before, but I did not test it massively to figure out why manual bp implementation is faster. Actually, we provide the bp code to ensure the backward can work correctly, due to earlier version of pytorch (<0.4) obtain a wrong result. So I cautiously thought the autograd engine may not perform perfectly, and many "extra operations" are needed to maintain better generality.

WarBean commented 5 years ago

@jiangtaoxie So it is better to use manual backward implementation for now. Thanks!

jiangtaoxie / fast-MPN-COV

Why is your manual implementation via autograd.Function even faster than PyTorch 's autograd engine? #8