sahagobinda / GPM

Official [ICLR] Code Repository for "Gradient Projection Memory for Continual Learning"
MIT License
86 stars 16 forks source link

How to understand the different between GPM #1

Closed leondada closed 3 years ago

leondada commented 3 years ago

from my understanding, if X=U\Sigam V, then P=X(X^\top X)^{-1}X^\top=UU^\top, so the main difference is that GPM does not use U but (U)_k, and that GPM does calculate P but direct memory (U)_k ?

sahagobinda commented 3 years ago

In GPM, the projection matrix is defined by MM^\top for a typical layer of NN. Matrix M contains the (U)_k. For each task, M is computed from the representation matrix R, where R contains the collection of inputs/input representations. For that, SVD is performed on R (=U\Sigam V^\top) to obtain U. From U, (U)_k is obtained through k-rank approximation. Memory, M is updated with this (U)_k. The memory update is done once per task at the end of each task. For more details please see section 5 of the paper.