Open aakutalev opened 3 years ago
at line 31 of elastic_weight_consolidation.py it calculates mean of log_likelihoods so grad_log_liklihood will contain mean of gradients of log_likelihoods and then at line 35 it squares this mean of gradients of log_likelihoods. this is WRONG because diagonal element of Fisher matrix is sum of squared gradients of log_liklihoods but not squared sum of gradients of log_liklihoods. so for each input the separate gradient of log_likelihood must be calculated, then each gradient must be squared and then mean of these squares must be calculated/
totally agree. But i think the difference between these 2 is minor
I implemented a version to compute sum of squared gradients
. See Here.
at line 31 of elastic_weight_consolidation.py it calculates mean of log_likelihoods so grad_log_liklihood will contain mean of gradients of log_likelihoods and then at line 35 it squares this mean of gradients of log_likelihoods. this is WRONG because diagonal element of Fisher matrix is sum of squared gradients of log_liklihoods but not squared sum of gradients of log_liklihoods. so for each input the separate gradient of log_likelihood must be calculated, then each gradient must be squared and then mean of these squares must be calculated/