entity-aware self-attention used in mLUKE

chantera commented 1 year ago

According to the mLUKE paper, mLUKE has not used entity-aware self-attention.

The word and entity tokens equally undergo self-attention computation (i.e., no entity-aware self-attention in Yamada et al. (2020)) after embedding layers.

However, the following code gives the warning message: (The message can be suppressed by giving "use_entity_aware_attention=True".)

>>> model = transformers.AutoModel.from_pretrained("studio-ousia/mluke-base-lite")
Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

>>> model.config.use_entity_aware_attention
False

In fact, the public model contains weights for entity-aware self-attention.

>>> state_dict = torch.load("pytorch_model.bin")
>>> list(state_dict.keys())
[..., 
'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias', 
'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias', 
'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias', 
...]

Could you make it clear whether mLUKE uses entity-aware self-attention?

config.json should specify use_entity_aware_attention: true unless pytorch_model.bin is updated with no weights for entity-aware self-attention.

ryokan0123 commented 1 year ago

In our pretraining and fine-tuning experiments, the mLUKE models did not use entity-aware self-attention.

The attention weights related to entity-aware self-attention (EASA) (e.g., w2e..., e2w..., e2e...) are included in the published model weights for the cases where users want to try using the entity-aware self-attention at fine-tuning time. The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper. By setting use_entity_aware_attention: true, these weights are loaded into the model.

By default, use_entity_aware_attention is set to false and the EASA weights are ignored because it is the setting described in the mLUKE paper.

The warning is somewhat disturbing... but it is expected behavior.

chantera commented 1 year ago

The values of the EASA weights are identical to the corresponding w2w... weights, which is how these weights are initialized in the LUKE paper.

I appreciate the clarification. Now I understand. So, use_entity_aware_attention: false is not misconfiguration and this ignores unused weights.

The warning is somewhat disturbing

To fix this confusing behavior, how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention? In this case, the model simply skips computation for the EASA weights when use_entity_aware_attention=False.

ryokan0123 commented 1 year ago

how about equipping a model with EASA weights regardless of the setting of use_entity_aware_attention?

Thank you for your suggestion. It can be an option, but I'm concerned that it adds an unnecessary memory footprint with the unused weights.

Other solutions I can come up with are...

Add a custom warning message to LukeModel to clarify what happens when users set use_entity_aware_attention: false.
Add an explanation to the model card on hugging face hub.

As this is not the first time this kind of confusion has occurred, I will definitely do the second one later soon. Then, probably will send a PR to hugging face for the first option.

chantera commented 1 year ago

I think we can use PreTrainedModel._keys_to_ignore_on_load_unexpected [transformers/modeling_utils.py].

ryokan0123 commented 1 year ago

Thank you for the pointer! It is a clean solution. I will send a PR to the transformers library to add options to the LUKE model.

studio-ousia / luke

entity-aware self-attention used in mLUKE #174