Closed Toni-Chan closed 5 years ago
We follow the settings in (https://github.com/thunlp/NRE) to prepare our corpus which is used for text-entity alignments and training word embeddings. Pre-trained word embeddings are learned from New York Times Annotated Corpus (LDC Data LDC2008T19), which should be obtained from LDC (https://catalog.ldc.upenn.edu/LDC2008T19). Our alignments are also based on this data. Because of the license, I can not directly release this data. Maybe you can try to download this data.
For link prediction, I align the knowledge graph FB15K to the New York Times Annotated Corpus (2008), based on entity anchors and entity names in Wikidata. You can download FB15K from the website or my another project https://github.com/thunlp/openke. But for the NYT-2008, you need to download the corpus from LDC and then align to FB15K by yourself. I think, to download the whole wikipedia and align it to FB15K is more easy.
Nah, I think I may have not expressed our concerns properly. I have sent you an email about the issues we encountered upon using the results from your paper (because our present approach requires comparison with other KG embedding/completion methods with text supervision, which you have mentioned in paper but not in code).
OK, I have received your mail.
We have noticed that you mentioned KG completion evaluation results. However in your code, the testing is for the relation extraction task. Can we know how you process with the KG completion task?