Distribution overlap calculation

Dear authors,

Congratulations your accepted paper at ICLR 2024 and your effort to release the code for community. After reading your paper, I think it's a good start to investigate the application of ensembling experts for CL. However, I have a question related to how did you select the experts for finetuning a new task.

As depicted in the Fig.3, the new distribution of task 3's classes will be compared with the old tasks t1 and t2 by the KL divergence. As a result, we have to save the distribution of old tasks. However, as indicated in the text below, the distribution set Q_k only contains the distributions of the current task's classes from 1 to C_t and ignore all tasks from previous classes. Then, in Eq.(2), the KL divergence is computed with in the set Q_k since both q_ik and q_ik are in Q_k. Therefore, we don't have to take into account the class distributions from previous tasks. And I take a look at the code from line 188, it's seem like you are still consider the class distribution of prior tasks.

I was confused about how to interpret it correctly and I am looking forward to hearing from you for clarification. Feel free to correct me if I was wrong.

Best, Cuong

grypesc / SEED

Distribution overlap calculation #2