Duplicated softmax on layer weights?

LorrinWWW commented 3 years ago

Thanks for sharing the code! In emd_task_distill.py, you seem to perform softmax on the layer weights twice by default. Is it intended or do I misunderstanding anything?

First softmax: https://github.com/lxk00/BERT-EMD/blob/2e1062bf9c912e6d335bcc994d372e962fe262df/bert-emd/emd_task_distill.py#L384-L385

where "get_new_layer_weight()" contains: https://github.com/lxk00/BERT-EMD/blob/2e1062bf9c912e6d335bcc994d372e962fe262df/bert-emd/emd_task_distill.py#L301-L302

Second softmax: https://github.com/lxk00/BERT-EMD/blob/2e1062bf9c912e6d335bcc994d372e962fe262df/bert-emd/emd_task_distill.py#L424-L429

sxlong0205 commented 3 years ago

这两个应该不是一个概念吧，我理解的是，第一个get_new_layer_weight中的softmax应该是对应的原论文公式23，即是在更新每个layer weight omega，第二个应该是单纯为了归一化方便计算

lxk00 commented 2 years ago

The experiment did use two softmax, in order to make the weight smoother. Later, we found that use addition of weight and normalize can also achieve ideal results.

LorrinWWW commented 2 years ago

Thank you for your clarification!

lxk00 / BERT-EMD

Duplicated softmax on layer weights? #5