justincui03 / tesla

MIT License
22 stars 2 forks source link

The computed gradient may differ from the equation #5

Open silicx opened 3 months ago

silicx commented 3 months ago

Hi, very inspiring work! The behavior of equation (6) implementation:

https://github.com/justincui03/tesla/blob/b9180b334bb726b5fb978abc8f1e0eedd82260b9/distill.py#L418-L420

is likely to be unexpected: the tuple returned by torch.autograd.grad is repeated twice, so gradients[0] is not multipled by 2. This is not a major issue, but may bother someone who likes accurate gradients (e.g. who is using default MTT parameters ).