samuelbroscheit / open_knowledge_graph_embeddings

Code to train open knowledge graph embeddings and to create a benchmark for open link prediction.
MIT License
25 stars 7 forks source link

About looking for the original sentences of the triples #3

Open Frankie123421 opened 2 years ago

Frankie123421 commented 2 years ago

Hey, I am recently studying your work. I would like to know if there exists some convenient approach to obtain the original sentences of the triples in the dataset. Must I look for them from OPIEC? Thank you.

samuelbroscheit commented 2 years ago

That is correct, you have to look at OPIEC. You can reuse the OLPBENCH pipeline (i.e. especially this part https://github.com/samuelbroscheit/open_knowledge_graph_embeddings/blob/main/preprocessing/process_avro.py ) and extend it to also create an index that associates each triple with its source sentence.

Frankie123421 commented 2 years ago

Thanks for your kind reply, I got it. So maybe I can also just concatenate the "word" in the "tokens" to directly obtain the sentence, is that correct? I don't know whether this would be more convenient than using the "sentence number" to further look for the sentence in the article, suppose that I just want to find the corresponding original sentences of the triples. By the way, I feel that maybe it will also be difficult to match the triples in this dataset to OPIEC by simply using loops for OPIEC is quite large?