Congratulations your accepted paper at ICLR 2024 and your effort to release the code for community. After reading your paper, I think it's a good start to investigate the application of ensembling experts for CL. However, I have a question related to how did you select the experts for finetuning a new task.
As depicted in the Fig.3, the new distribution of task 3's classes will be compared with the old tasks t1 and t2 by the KL divergence. As a result, we have to save the distribution of old tasks. However, as indicated in the text below, the distribution set Q_k only contains the distributions of the current task's classes from 1 to C_t and ignore all tasks from previous classes. Then, in Eq.(2), the KL divergence is computed with in the set Q_k since both q_ik and q_ik are in Q_k. Therefore, we don't have to take into account the class distributions from previous tasks. And I take a look at the code from line 188, it's seem like you are still consider the class distribution of prior tasks.
I was confused about how to interpret it correctly and I am looking forward to hearing from you for clarification. Feel free to correct me if I was wrong.
Sorry I misunderstood, the selection is to choose the expert with least overlapping of a new classes, and don't need to compare with previous distributions.
Dear authors,
Congratulations your accepted paper at ICLR 2024 and your effort to release the code for community. After reading your paper, I think it's a good start to investigate the application of ensembling experts for CL. However, I have a question related to how did you select the experts for finetuning a new task.
As depicted in the Fig.3, the new distribution of task 3's classes will be compared with the old tasks t1 and t2 by the KL divergence. As a result, we have to save the distribution of old tasks. However, as indicated in the text below, the distribution set Q_k only contains the distributions of the current task's classes from 1 to C_t and ignore all tasks from previous classes. Then, in Eq.(2), the KL divergence is computed with in the set Q_k since both q_ik and q_ik are in Q_k. Therefore, we don't have to take into account the class distributions from previous tasks. And I take a look at the code from line 188, it's seem like you are still consider the class distribution of prior tasks.
I was confused about how to interpret it correctly and I am looking forward to hearing from you for clarification. Feel free to correct me if I was wrong.
Best, Cuong