Closed leondada closed 3 years ago
In GPM, the projection matrix is defined by MM^\top for a typical layer of NN. Matrix M contains the (U)_k. For each task, M is computed from the representation matrix R, where R contains the collection of inputs/input representations. For that, SVD is performed on R (=U\Sigam V^\top) to obtain U. From U, (U)_k is obtained through k-rank approximation. Memory, M is updated with this (U)_k. The memory update is done once per task at the end of each task. For more details please see section 5 of the paper.
from my understanding, if X=U\Sigam V, then P=X(X^\top X)^{-1}X^\top=UU^\top, so the main difference is that GPM does not use U but (U)_k, and that GPM does calculate P but direct memory (U)_k ?