Closed hedonihilist closed 2 years ago
In the NER example, the entity token embedding is reconstructed here. Therefore, index 0 and index 1 correspond to the padding and [MASK] tokens, respectively. Entity embeddings should be obtained by inputting [MASK] entity token(s) to the model.
Thanks for your reply.
If I understand it correctly, the meaning of entity_ids is task related. In the case of NER, they represents [MASK]
or padding
. In the case of relation extraction, they represents head
and tail
entities. Right?
If I want to get embedding of variable multiple entities in the sentence, how can I input achieve this? If I input multiple [MASK]
entity tokens, how can I tell which is which?
In the NER task, we input multiple [MASK] entities to the model to compute entity representations in an input text. If you input multiple [MASK] entities, the model can treat these entities differently if their entity_position_ids are different. If you need to input entity type information (e.g., HEAD or TAIL in the relation classification), you can create new entity tokens representing the entity types, and initialize these token embeddings using the token embedding of the [MASK] entity.
Thanks!
Hi all,
I have trouble understanding the meaning of
entity_ids
in the code.https://github.com/studio-ousia/luke/blob/5023a8a4d534c6ae8cecc7c308f65bd3b078aa32/examples/ner/utils.py#L192
In the ner example code, the entity_id is either
1
or0
. What does0
or1
mean?I am trying to obtain the embeddings of the entities in the text(entity positions can be resolved by external tools), how can I construct the entity_ids ?