Closed colinallen closed 6 years ago
This seemed easy to debug via the single-entry option until I got a NotImplementedError
...
inphosite@ip-172-31-21-141:/var/inpho/inpho/data/fuzzy
$ python -m inpho.corpus.sep --entry lukasiewicz
processing lukasiewicz...
Traceback (most recent call last):
File "/home/inphosite/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/inphosite/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/var/inpho/inpho/inpho/corpus/sep.py", line 597, in <module>
update_db=options.update_db)
File "/var/inpho/inpho/inpho/corpus/sep.py", line 465, in mine_article
update_partial_graph(entity_type, occurrences)
File "/var/inpho/inpho/inpho/corpus/sep.py", line 472, in update_partial_graph
raise NotImplementedError
NotImplementedError
I implemented a quick single-entry fuzzymatch and found that your suspicion of a unicode error was right:
Jan Łukasiewicz
That doesn't quite explain the matches it discovered
$ cat lukasiewicz
713,philosophy of law,0.5
725,philosophy of war,0.5
1481,war,0.5
1562,sin,0.5
1786,pain,0.5
Going to do HTML entity translation and unidecode to see if that helps.
Some success via unidecode
and unescape
!
713,philosophy of law,0.5
725,philosophy of war,0.5
1481,war,0.5
1562,sin,0.5
1786,pain,0.5
3495,Jan Lukasiewicz,1.0
5554,Jan Lukasiewicz,1.0
Closed via inpho/inpho@2f1243e.
For future reference, a single entry can now be manually fuzzed with:
python -m inpho.corpus.sep --fuzzy lukasiewicz
https://www.inphoproject.org/thinker/5554
This is reporting new entry, no data for the fingerprint, but the entry has been online for a few years: https://plato.stanford.edu/entries/lukasiewicz/
Could be a unicode problem in name?