Closed Steffen-Wolf closed 5 years ago
Small update. I managed to attach the hook directly to the model instead of the prediction tensor. This makes the comment above irrelevant for this task. I still think that we can get cleaner code by removing the backward() call out of the apply_model_and_loss function, but I'll leave that for another PR.
Looks good to me. Merge when ready pls. 👍
I would like to be able to visualize the gradients that the loss function induces on the network output (See this PR). However, I needed to move the .backwards() call out of the apply_model_and_loss function. I would argue that it should not be in there in the first place. We only call it twice:
I would argue it is cleaner to remove .backwards() and call it outside the function. If you agree, we could merge this PR into master