Closed jatentaki closed 2 years ago
Yes this is an autograd hack to make the computational graph sequential. We need this because at that time (pytorch 1.4 I believed) there was concurrency bug in pytorch - training for long enough can see a race condition on a C++ variable that stores information about enable/disable gradient flow.
I'm trying to figure out the role of this line. It looks like some autograd hack similar to straight through estimators of form
but in this case it really seems to be a no-op. Could I ask for some explanation?
Thank you