Thank you for presenting such an inspiring research!
While you are inspired by neuroscientific studies of the brain, I am wondering how the performance of the model can be analyzed in the context of online learning theory. If we consider the training and testing process of each permutation as a "period" in online learning and consider the goal of learning all the different tasks as optimizing the average loss of the whole learning process, traditional ANN seems like a simple Follow the Leader learning rule, while the context signal performs as a regularizing term, which makes your architecture works like a Follow the Regularized Leader rule. And the stability of Follow the Regularized Leader is probably why your architecture works better.
Thank you for presenting such an inspiring research! While you are inspired by neuroscientific studies of the brain, I am wondering how the performance of the model can be analyzed in the context of online learning theory. If we consider the training and testing process of each permutation as a "period" in online learning and consider the goal of learning all the different tasks as optimizing the average loss of the whole learning process, traditional ANN seems like a simple Follow the Leader learning rule, while the context signal performs as a regularizing term, which makes your architecture works like a Follow the Regularized Leader rule. And the stability of Follow the Regularized Leader is probably why your architecture works better.