inpho / inphosite

The InPhO API
https://inphoproject.org
15 stars 5 forks source link

lukasiewicz entry reporting no data #166

Closed colinallen closed 6 years ago

colinallen commented 6 years ago

https://www.inphoproject.org/thinker/5554

This is reporting new entry, no data for the fingerprint, but the entry has been online for a few years: https://plato.stanford.edu/entries/lukasiewicz/

Could be a unicode problem in name?

JaimieMurdock commented 6 years ago

This seemed easy to debug via the single-entry option until I got a NotImplementedError...

inphosite@ip-172-31-21-141:/var/inpho/inpho/data/fuzzy 
$ python -m inpho.corpus.sep --entry lukasiewicz
processing lukasiewicz...
Traceback (most recent call last):
  File "/home/inphosite/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/inphosite/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/var/inpho/inpho/inpho/corpus/sep.py", line 597, in <module>
    update_db=options.update_db)
  File "/var/inpho/inpho/inpho/corpus/sep.py", line 465, in mine_article
    update_partial_graph(entity_type, occurrences)
  File "/var/inpho/inpho/inpho/corpus/sep.py", line 472, in update_partial_graph
    raise NotImplementedError
NotImplementedError
JaimieMurdock commented 6 years ago

I implemented a quick single-entry fuzzymatch and found that your suspicion of a unicode error was right:

Jan &#321;ukasiewicz

That doesn't quite explain the matches it discovered

$ cat lukasiewicz 
713,philosophy of law,0.5
725,philosophy of war,0.5
1481,war,0.5
1562,sin,0.5
1786,pain,0.5

Going to do HTML entity translation and unidecode to see if that helps.

JaimieMurdock commented 6 years ago

Some success via unidecode and unescape!

713,philosophy of law,0.5
725,philosophy of war,0.5
1481,war,0.5
1562,sin,0.5
1786,pain,0.5
3495,Jan Lukasiewicz,1.0
5554,Jan Lukasiewicz,1.0
JaimieMurdock commented 6 years ago

Closed via inpho/inpho@2f1243e.

JaimieMurdock commented 6 years ago

For future reference, a single entry can now be manually fuzzed with:

python -m inpho.corpus.sep --fuzzy lukasiewicz