Open gloryVine opened 4 years ago
Hello @gloryVine ,
You probably do not work on this anymore, but I think this is an issue with WN18RR itself, as it only defines entities only by their offset (which yields duplicates).
I ran some statistics, and I get that 161 entities more with the textual data - in 160 cases, 2 have the same offset; while 3 have the same offset id in 1 case - yielding 161 entities. This is of course a source for false positives and noise in the data
Hey,
the dataset in the folder datasets_knowledge_embedding/WN18RR/original/ has 40943 entities, as it should be according to Dettmers et al.'s paper. Yet datasets_knowledge_embedding/WN18RR/text/ has 41105 entitites, which means it has 162 entities more than it should. Any idea why this is the case?