Closed ghpu closed 6 years ago
More changes are required : Human can be replaced by 'Q5', but Food cannot be replaced by 'Q2095'
issue running 'extraction/classifiers/type_classifier.py', please fix. 'enwiki/Q2095' Traceback (most recent call last): File "extraction/project_graph.py", line 123, in main classification = classifier.classify(collection) File "extraction/classifiers/type_classifier.py", line 32, in classify FOOD = wkp(c, "Q2095") File "extraction/classifiers/type_classifier.py", line 14, in wkp return c.article2id['enwiki/' + name][0][0] File "src/marisa_trie.pyx", line 578, in marisa_trie.BytesTrie.getitem (src/marisa_trie.cpp:10859) KeyError: 'enwiki/Q2095'
In classifiers two function help with construct indices from string names:
def wkp(c, name):
return c.article2id['enwiki/' + name][0][0]
def wkd(c, name):
return c.name2index[name]
the wkp
function uses Wikipedia titles to get a numeric id (but assumes the name is from the English wikipedia), while wkd
uses Wikidata's id scheme to get a numeric id. So you would have wkp(c, "Human") == wkd(c, "Q5")
.
Is there any other information you could give about this problem (e.g. Python version, wikipedia corpuses you extracted to run this) ?
Also, random theory: the issue might be bytes vs. str in the case you mentioned (e.g. do:
def wkp(c, name):
return c.article2id[('enwiki/' + name).encode("utf-8")][0][0]
If for some reason marisa-trie requires byte keys on your installed version
Unfortunately, I haven't stored the wikipedia corpuses any longer. They were downloaded on February 22th I have upgraded my system since, and am trying again with the updated code, will keep you informed if it is now solved.
Thanks! Please re-open when if the problem persists :)
Problem solved with latest version of code (commit 7271648f) , Python 3.6rc5 on Ubuntu Bionic Beaver (18.04), and latest wikipedia dumps (as of 23th March 2018).
Thanks for your time !
issue running 'extraction/classifiers/type_classifier.py', please fix. 'enwiki/Human' Traceback (most recent call last): File "extraction/project_graph.py", line 123, in main classification = classifier.classify(collection) File "extraction/classifiers/type_classifier.py", line 26, in classify HUMAN = wkp(c, "Human") File "extraction/classifiers/type_classifier.py", line 14, in wkp return c.article2id['enwiki/' + name][0][0] File "src/marisa_trie.pyx", line 578, in marisa_trie.BytesTrie.getitem (src/marisa_trie.cpp:10859) KeyError: 'enwiki/Human'
Should type_classifier.py be updated somehow like fast_link_fixer.py ?