Open RobertCsordas opened 1 year ago
Hi,
I noticed that there is an inconsistency Eq.9 in the paper (https://arxiv.org/pdf/2210.05144.pdf) and https://github.com/yikangshen/MoA/blob/master/moa_layer/parallel_linear/moe.py#L124C8-L125C69. Could you please clarify which version was used for the numbers in the paper?
Thank you, Robert
Hi Robert,
Normalization can be considered an optional feature. In my experience, using normalization could result in slightly better performance when k=2.
Regards, Yikang
Hi,
I noticed that there is an inconsistency Eq.9 in the paper (https://arxiv.org/pdf/2210.05144.pdf) and https://github.com/yikangshen/MoA/blob/master/moa_layer/parallel_linear/moe.py#L124C8-L125C69. Could you please clarify which version was used for the numbers in the paper?
Thank you, Robert