vrenkens / nabu

Code for end-to-end ASR with neural networks, build with TensorFlow
MIT License
108 stars 43 forks source link

fisher information #14

Closed jannaescur closed 6 years ago

jannaescur commented 6 years ago

just one theoretical curiosity, why did you decide to have a fisher trainer? In what aspects does retrieving the fisher information changes the training? I don't really know how to interpret the fisher loss. Could you summarize it?

vrenkens commented 6 years ago

A student of mine was working on transfer learning between languages using a LAS architecture. A problem with transfer learning is catastrophic forgetting. You can use elastic weight consolidation to try and solve this issue. It basically puts a spring on each parameter of the network. If the spring is stiff the parameter is not allowed to deviate far from the initial parameter. The fisher information is used to determine the stiffness of the springs and the fisher loss represents the deviation from the initial model weigted with the stiffness.

jannaescur commented 6 years ago

thanks! and I guess that even if using this trainer, the loss given by the test is still the PER?

vrenkens commented 6 years ago

yes :)