The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".
MIT License
251
stars
50
forks
source link
scaling up the loss before calculating gradient #11
Hi,
first of all, great paper and great code, thank you for sharing it :)
I was wondering - why do you scale up the loss before the backward() call (multiplying by 1024.), and then dividing it again before the weights update?
It's a historical part from mixed precision training. I don't remember if it had any effect on the results in this project.
You can refer to this link for the purpose of this part.
Hi, first of all, great paper and great code, thank you for sharing it :) I was wondering - why do you scale up the loss before the backward() call (multiplying by 1024.), and then dividing it again before the weights update?