Closed gyglim closed 5 years ago
Hi Michael,
The difference that I have in my code is the gradient of the L2.
Best, Rahaf
On Tue, 15 Oct 2019, 22:04 Michael Gygli, notifications@github.com wrote:
According to the paper the weight difference is squared when computing the loss, cf. Eq. (3)
In the code however it looks like it's just the difference:
That would mean that a negative difference would lead to a negative penalty! Is that a bug or am I missing something?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rahafaljundi/MAS-Memory-Aware-Synapses/issues/3?email_source=notifications&email_token=AESJC3CS2TQZTT353TTSXHDQOYO5RA5CNFSM4JBB3NLKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HR7EP2Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESJC3EO2HCHNQYCET5WQYLQOYO5RANCNFSM4JBB3NLA .
Hi Rahaf
Ah, I see, you directly compute gradient of the loss, not the loss. Got it :). Thanks for the clarification
Cheers, Michael
According to the paper the weight difference is squared when computing the loss, cf. Eq. (3)
In the code however it looks like it's just the difference: https://github.com/rahafaljundi/MAS-Memory-Aware-Synapses/blob/c3e6a855cdde588fb74aeb876f84340eb6090ad5/MAS_to_be_published/MAS_utils/MAS_based_Training.py#L80
That would mean that a negative difference would lead to a negative penalty! Is that a bug or am I missing something?