Fisher weights calculation details

EndingCredits commented 7 years ago

As I understand it, the fisher weights of x for weights are the squares of the gradients d/dw (log x). (Let me know if this is incorrect.)

In my own implementations I use the loss term for x, but in other implementations I see the model prediction used instead. I can't remember why I used this, although I suspect it's because I needed a value >0 (for which raw predictions of a Q-network would not work).

Should the Fisher matrix be based on the raw output rather than the error? (In practice, it seems to work based on the error, which makes sense, although I suspect basing it on the output would work better.)

CharlesVhs commented 1 year ago

Hi EndingCreedits,

Did you got the answer to your question? I am wondering the same thing as you did.

EndingCredits commented 1 year ago

It was a long time ago, so I don't recall, but assuming I did not.

stokesj / EWC

Fisher weights calculation details #2