Open evangeliazve opened 1 year ago
Answers to your question:
Hello,
Thanks for your quick reply. It's clear. I have one more question. Do you think is possible to save the Relation Extraction Model you proposed in Hugging Face ?
Best, Evangelia Zve
I haven't tried it myself, but it should be possible with push_to_hub
as detailed in this page -- https://huggingface.co/docs/transformers/v4.15.0/model_sharing
Thank you very much for your help
Hello,
"Not directly, but you can adapt the Evaluation code (for example cell 29 in 05a-nyt-re-bert.ipynb) and the Preprocessing code to take a single sentence with entities, embed the PHRASE spans and encode them using the tokenizer into a batch size of 1."
Regarding this, I cannot handle the preprocessing part as I need to define relationship to create the span_idxs Should I create span ids for every possible relationship and then predict if it is actually a relationship or not ?
Thanks again
Hello! I am also very interested in performing NER based on 🤗Transformers. I managed to adapt the code of 05-nyt-re-bert for inference (see below). You can then retrieve the name of the relation class using id2label() with the predicted output. @evangeliazve the span_idxs array does not contain relationships but simply the positions of the two spans containing the named entities.
model = BertForRelationExtraction.from_pretrained(os.path.join(MODEL_DIR, "ckpt-{:d}".format(epoch)), len(valid_relations))
input_object = {
"tokens": ["But", "that", "spasm", "of", "irritation", "by", "a", "master", "intimidator", "was", "minor", "compared", "with", "what", "<S:PER>", "Bobby", "Fischer", "</S:PER>", ",", "the", "erratic", "former", "world", "chess", "champion", ",", "dished", "out", "in", "March", "at", "a", "news", "conference", "in", "Reykjavik", ",", "<O:LOC>", "Iceland", "</O:LOC>", "."]
}
def encode_data_inference(examples):
tokenized_inputs = tokenizer(examples["tokens"],
is_split_into_words=True,
truncation=True,
return_tensors ="pt") # this is needed because for training, conversion to tensors is performed using the DataLoader
span_idxs = []
for input_id in tokenized_inputs.input_ids:
tokens = tokenizer.convert_ids_to_tokens(input_id)
print(tokens)
span_idxs.append([
[idx for idx, token in enumerate(tokens) if token.startswith("<S:")][0],
[idx for idx, token in enumerate(tokens) if token.startswith("</S:")][0],
[idx for idx, token in enumerate(tokens) if token.startswith("<O:")][0],
[idx for idx, token in enumerate(tokens) if token.startswith("</O:")][0]
])
tokenized_inputs["span_idxs"] = torch.from_numpy(np.array(span_idxs)) # manually create a tensor containing the span ids
return tokenized_inputs
input = encode_data_inference(input_object)
with torch.no_grad():
logits = model(**input).logits
print(logits)
predictions = torch.argmax(outputs.logits, dim=-1).cpu().numpy()
print(predictions)
Output:
['[CLS]', 'But', 'that', 'spa', '##sm', 'of', 'irritation', 'by', 'a', 'master', 'in', '##ti', '##mi', '##da', '##tor', 'was', 'minor', 'compared', 'with', 'what', '
Hello,
"Not directly, but you can adapt the Evaluation code (for example cell 29 in 05a-nyt-re-bert.ipynb) and the Preprocessing code to take a single sentence with entities, embed the PHRASE spans and encode them using the tokenizer into a batch size of 1."
Regarding this, I cannot handle the preprocessing part as I need to define relationship to create the span_idxs Should I create span ids for every possible relationship and then predict if it is actually a relationship or not ?
Thanks again
Sorry for the delay in responding, looks like I missed this comment. And thanks for the nice example @darebfh ! Looks like it predicted an incorrect relationship id 6 which is location/neighborhood/neighborhood_of
but given that there does not seem to be anything specifically defined for (person, ?, location)
maybe this is the best it could do.
Hi again,
Thank you again for this solution. I would be happy to share your work in way you wish. Pleas let me know.
I have the following questions :
Best Regards, Evangelia Zve