Question: why using backward_hooks on Modules and not on Tensors?

LaRiffle commented 4 years ago

Hi! I have a quick question: I've seen that you have built backward hooks for many nn.Module classes where you basically compute the per-sample gradient. Do you think it could be possible to do this at a tensor level, or it would be impossible to capture the "per-sample" notion?

karthikprasad commented 4 years ago

Hi @LaRiffle, thanks for the question. Are you looking to capture the per-sample gradient of any specific tensor other than the parameters?

In the current version of Pytorch, computing per-sample gradients with a hook on the module seemed to be the easiest way to go about it. It should become a lot easier once vmap support is added to Pytorch (https://github.com/pytorch/pytorch/issues/42368).

LaRiffle commented 4 years ago

Oups @karthikprasad sorry for the delay :/

Are you looking to capture the per-sample gradient of any specific tensor other than the parameters? no just for the parameters!

Ok no worries, yes now that I better understand your codebase it seems to be the best way to go!

What I was trying to do is handle "remote" registration of hooks in PySyft, but apparently backward_hooks on Module will just register_hooks on tensors under the hood, so it's almost equivalent for me :) Btw it worked, and we now support Opacus as a DP layer for PySyft 🙌 https://blog.openmined.org/pysyft-opacus-federated-learning-with-differential-privacy/

karthikprasad commented 4 years ago

That’s awesome! Thank you for sharing it :)

pytorch / opacus

Question: why using backward_hooks on Modules and not on Tensors? #66