this ewc implementation CODE has theoretical ERROR which prevent ewc to work properly

aakutalev commented 3 years ago

at line 31 of elastic_weight_consolidation.py it calculates mean of log_likelihoods so grad_log_liklihood will contain mean of gradients of log_likelihoods and then at line 35 it squares this mean of gradients of log_likelihoods. this is WRONG because diagonal element of Fisher matrix is sum of squared gradients of log_liklihoods but not squared sum of gradients of log_liklihoods. so for each input the separate gradient of log_likelihood must be calculated, then each gradient must be squared and then mean of these squares must be calculated/

AkaTsukijm commented 1 year ago

at line 31 of elastic_weight_consolidation.py it calculates mean of log_likelihoods so grad_log_liklihood will contain mean of gradients of log_likelihoods and then at line 35 it squares this mean of gradients of log_likelihoods. this is WRONG because diagonal element of Fisher matrix is sum of squared gradients of log_liklihoods but not squared sum of gradients of log_liklihoods. so for each input the separate gradient of log_likelihood must be calculated, then each gradient must be squared and then mean of these squares must be calculated/

totally agree. But i think the difference between these 2 is minor

ThomasAtlantis commented 12 months ago

I implemented a version to compute sum of squared gradients. See Here.

shivamsaboo17 / Overcoming-Catastrophic-forgetting-in-Neural-Networks

this ewc implementation CODE has theoretical ERROR which prevent ewc to work properly #6