Open chantera opened 1 year ago
In our pretraining and fine-tuning experiments, the mLUKE models did not use entity-aware self-attention.
The attention weights related to entity-aware self-attention (EASA) (e.g., w2e...
, e2w...
, e2e...
) are included in the published model weights for the cases where users want to try using the entity-aware self-attention at fine-tuning time.
The values of the EASA weights are identical to the corresponding w2w...
weights, which is how these weights are initialized in the LUKE paper.
By setting use_entity_aware_attention: true
, these weights are loaded into the model.
By default, use_entity_aware_attention
is set to false
and the EASA weights are ignored because it is the setting described in the mLUKE paper.
The warning is somewhat disturbing... but it is expected behavior.
The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.
I appreciate the clarification. Now I understand.
So, use_entity_aware_attention: false
is not misconfiguration and this ignores unused weights.
The warning is somewhat disturbing
To fix this confusing behavior, how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention
?
In this case, the model simply skips computation for the EASA weights when use_entity_aware_attention=False
.
how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?
Thank you for your suggestion. It can be an option, but I'm concerned that it adds an unnecessary memory footprint with the unused weights.
Other solutions I can come up with are...
LukeModel
to clarify what happens when users set use_entity_aware_attention: false
.As this is not the first time this kind of confusion has occurred, I will definitely do the second one later soon. Then, probably will send a PR to hugging face for the first option.
I think we can use PreTrainedModel._keys_to_ignore_on_load_unexpected
[transformers/modeling_utils.py].
Thank you for the pointer! It is a clean solution. I will send a PR to the transformers library to add options to the LUKE model.
According to the mLUKE paper, mLUKE has not used entity-aware self-attention.
However, the following code gives the warning message: (The message can be suppressed by giving "use_entity_aware_attention=True".)
In fact, the public model contains weights for entity-aware self-attention.
Could you make it clear whether mLUKE uses entity-aware self-attention?
config.json
should specifyuse_entity_aware_attention: true
unlesspytorch_model.bin
is updated with no weights for entity-aware self-attention.