stokesj / EWC

TensorFlow implementation of Elastic Weight Consolidation
74 stars 22 forks source link

compute fisher matrix after train? #1

Open jeong-tae opened 7 years ago

jeong-tae commented 7 years ago

I am not sure i understand correctly.

it looks like you are compute fisher matrix for current task with samples after the training a current task. am i right?

Then, when moving to third task... does fisher diagonal compute for all previous tasks? or just compute previous task only? I think if the former, EWC is useless because we do not want to see previous. right?

i also implement the EWC using tensorflow. but, i am not sure that i am right. so i am referring your code :D. Thank you for your advanced work!

p.s why do we need to compute fisher diagonal per example? why not per batch?

stokesj commented 6 years ago

Yes, I am just computing the Fisher information for the trained neural network using the previous task's data only.

Unfortunately, TensorFlow does not expose the unaggregated gradients, which are required to compute the Fisher information. The workaround I chose was to hardcode the unaggregated gradients directly into the computation graph, using mini batches of fixed size 100 to allow parallelization (computing full-batch directly runs out of memory). It is then a simple matter to obtain the full-batch Fisher information by accumulating 550 mini-batches (see update_fisher_full_batch method). The simpler solution (which prevents parallelization) is to loop over the training data (55000 examples) using minibatches of size 1.

jeong-tae commented 6 years ago

what is the difference with calculating a mean of fisher information on aggregated gradients? The code is summing up and divide by the number of inputs, which seems the value of mean.

unaggregating the gradients make code more complicated, isn't it? why don't you use just mean of aggregated gradients?