The computed gradient may differ from the equation

Hi, very inspiring work! The behavior of equation (6) implementation:

https://github.com/justincui03/tesla/blob/b9180b334bb726b5fb978abc8f1e0eedd82260b9/distill.py#L418-L420

is likely to be unexpected: the tuple returned by torch.autograd.grad is repeated twice, so gradients[0] is not multipled by 2. This is not a major issue, but may bother someone who likes accurate gradients (e.g. who is using default MTT parameters ).

justincui03 / tesla

The computed gradient may differ from the equation #5