scaling up the loss before calculating gradient

lliuz / ARFlow

The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".

MIT License

251 stars 50 forks source link

scaling up the loss before calculating gradient #11

Closed hanit92 closed 4 years ago

hanit92 commented 4 years ago

Hi, first of all, great paper and great code, thank you for sharing it :) I was wondering - why do you scale up the loss before the backward() call (multiplying by 1024.), and then dividing it again before the weights update?

lliuz commented 4 years ago

It's a historical part from mixed precision training. I don't remember if it had any effect on the results in this project. You can refer to this link for the purpose of this part.