mana-ysh / knowledge-graph-embeddings

Implementations of Embedding-based methods for Knowledge Base Completion tasks
Apache License 2.0
257 stars 63 forks source link

KeyError in Vocab #6

Closed subhashree8 closed 6 years ago

subhashree8 commented 6 years ago

I am getting a KeyError in line 41 of utils/dataset.py irrespective of whichever input entity/relation is given. ent_vocab[entity_name] simply doesn't work. Has the structure of the vocabulary been changed recently?

mana-ysh commented 6 years ago

Hi, @subhashree8

Sorry for late reply. I haven't modified the code related to vocab so far. Can you show me the command you ran? I think each entity in the entity list file doesn't match to one in the training triplets. If you don't use wordnet-mlj12 or FB15k dataset, please confirm it.

subhashree8 commented 6 years ago

Hi @mana-ysh, this is the command I ran: run train.py --mode single --ent train.entlist --rel train.rellist --train wordnet-mlj12-train.txt --valid wordnet-mlj12-valid.txt --log C:\Users\subha_000\Documents\knowledge-graph-embeddings-master\knowledge-graph-embeddings-master\src\logs

I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.

mana-ysh commented 6 years ago

I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.

Does it mean that you ran preprocessing.sh ?

subhashree8 commented 6 years ago

Yes

mana-ysh commented 6 years ago

Thank you for replying. In my environment, it works.

Can you check that the same files as me are generated?

▶  head train.entlist
00001740
00001930
00002137
00002325
00002452
00002573
00002684
00002724
00002942
00003316
▶  head train.rellist
_also_see
_derivationally_related_form
_has_part
_hypernym
_hyponym
_instance_hypernym
_instance_hyponym
_member_holonym
_member_meronym
_member_of_domain_region
▶  head wordnet-mlj12-train.txt
03964744    _hyponym    04371774
00260881    _hypernym   00260622
02199712    _member_holonym 02188065
01332730    _derivationally_related_form    03122748
06066555    _derivationally_related_form    00645415
09322930    _instance_hypernym  09360122
11575425    _hyponym    12255934
07193596    _derivationally_related_form    00784342
05726596    _hyponym    06162979
01768969    _derivationally_related_form    02636811
▶  head wordnet-mlj12-valid.txt
02174461    _hypernym   02176268
05074057    _derivationally_related_form    02310895
08390511    _synset_domain_topic_of 08199025
02045024    _member_meronym 02046321
04758181    _hypernym   04757864
09419536    _instance_hypernym  09411430
12165384    _hypernym   12163824
09384921    _part_of    08853741
04881998    _derivationally_related_form    01299888
00612652    _derivationally_related_form    01004072

Triplet files are tab-separated and entity/relation list files contain each ID or name line by line.

subhashree8 commented 6 years ago

Thanks a lot for replying. Yes, I have exactly the same files. When I print the keys of rel_vocab along with their type and length of the key string, I get this: _also_see <class 'str'> 20 <class 'str'> 1 _derivationally_related_form <class 'str'> 57 _has_part <class 'str'> 19 _hypernym <class 'str'> 19 _hyponym <class 'str'> 17 _instance_hypernym <class 'str'> 37 _instance_hyponym <class 'str'> 35 _member_holonym <class 'str'> 31 _member_meronym <class 'str'> 31 _member_of_domain_region <class 'str'> 49 _member_of_domain_topic <class 'str'> 47 _member_of_domain_usage <class 'str'> 47 _part_of <class 'str'> 17 _similar_to <class 'str'> 23 _synset_domain_region_of <class 'str'> 49 _synset_domain_topic_of <class 'str'> 47 _synset_domain_usage_of <class 'str'> 47 _verb_group <class 'str'> 23

I am not sure why the 2nd one is a blank key. Also, when I print the length of the key "_hyponym" which is given as input for rel_vocab in line 41, it shows: _hyponym <class 'str'> 8

Is the difference in the lengths of the same string in the 2 different places, the cause for the concern?

mana-ysh commented 6 years ago

This issue is caused by running preprocees.sh. This shell script generates different list files because of the different unix environment. I will upload the already preprocessed files near future to resolve this

Thanks @subhashree8 for your cooperation!