Question on entity2id - Githubissues

RyanOngAI commented 1 year ago

Hi,

Great work and love the dataset! I can't seem to figure out the entity2id.txt files in all three dataset variations.

For example, in NELL-MBE, we have entities mapped to the same id?: 'concept_company_adobe': '293', 'concept_sportsleague_nba': '293', 'concept_country___america': '292', 'concept_language_english': '292'

Regards to WN-MBE, I don't get the "entity" to id mapping. As far as I know, WN-MBE has a total of around 40K entities so how can the entity be "8524735"? And is there a way to get the text mapping? 8524735 507 8860123 493 7846 382 8199025 358 11579418 305 8441203 301 1507175 282 126264 272 1864707 263 11585340 253 6845599 239 12205694 236 11567411 229 11556857 222 1432517 221 1342529 210

Thanks for the help!

yncui-nju commented 1 year ago

Hi there, thanks for your interest in our work!

It looks like you've raised two issues:

1. Duplicate IDs: We follow the data preprocessing of Multi-Hop , where the numbers following entity names represent their degrees, not IDs. So, there might be repeated numbers. When reading the data, we will reassign numerical IDs to them.

2. Entity Names in WordNet: The entity names in WN18RR correspond to their IDs in the entire WordNet, hence the large numbers. You can refer to this link to obtain textual labels for these entities.

Please let us know if you need further assistance or have more questions. If you'd like to engage in further discussion, please get in touch with me at yncui.nju@gmail.com.

RyanOngAI commented 1 year ago

Thank you @yncui-nju for the prompt response!

nju-websoft / MBE

Question on entity2id #4