Open Frankie123421 opened 2 years ago
That is correct, you have to look at OPIEC. You can reuse the OLPBENCH pipeline (i.e. especially this part https://github.com/samuelbroscheit/open_knowledge_graph_embeddings/blob/main/preprocessing/process_avro.py ) and extend it to also create an index that associates each triple with its source sentence.
Thanks for your kind reply, I got it. So maybe I can also just concatenate the "word" in the "tokens" to directly obtain the sentence, is that correct? I don't know whether this would be more convenient than using the "sentence number" to further look for the sentence in the article, suppose that I just want to find the corresponding original sentences of the triples. By the way, I feel that maybe it will also be difficult to match the triples in this dataset to OPIEC by simply using loops for OPIEC is quite large?
Hey, I am recently studying your work. I would like to know if there exists some convenient approach to obtain the original sentences of the triples in the dataset. Must I look for them from OPIEC? Thank you.