huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.03k stars 627 forks source link

tinybert linear transformation sharing #205

Open sh0416 opened 2 years ago

sh0416 commented 2 years ago

Hi,

In the paper, the linear transformation is performed to match the hidden representation between student and teacher embeddings.

In the code, this is implemented using fit_dense, but this layer is instantiated only once.

It means that the linear transformation weight is shared through the layers, do I understand clearly?