wikilinks / neleval

Entity disambiguation evaluation and error analysis tool
Apache License 2.0
116 stars 23 forks source link

Error in nel with non-matching mentions #5

Closed slbayer closed 10 years ago

slbayer commented 10 years ago

I have a minimal case that seems to break the scorer (and I confess, I'm using the scorer for an entity clustering and linking evaluation which isn't TAC KBP).

Let's say your corpus contains two documents. Document 1 contains a gold mention, and document 2 contains no gold mentions; for the evaluated system, document 1 contains no mentions and document 2 contains one mention. This minimal case causes the scorer to fail as follows:

INFO Converting gold to evaluation format.. INFO Converting systems to evaluation format.. INFO Evaluating systems.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO Preparing summary report.. INFO Calculating confidence intervals.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO preparing strong_link_match report.. INFO preparing strong_nil_match report.. INFO preparing strong_all_match report.. INFO preparing strong_typed_link_match report.. INFO Preparing error report.. Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 60, in main() File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 57, in main print(obj()) File "neleval/analyze.py", line 75, in call counts = Counter(error.label for error in _data()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 444, in init self.update(iterable, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 525, in update for elem in iterable: File "neleval/analyze.py", line 75, in counts = Counter(error.label for error in _data()) File "neleval/analyze.py", line 86, in iter_errors assert g.id == s.id AssertionError

I'd be happy to send you the minimal test I've set up, if you need it. I'd try to fix it myself, but I'm hoping that you'll be faster :-).

I'm working with the latest clone of the the repository. MacOS 10.9.5, Python 2.7.5, numpy-1.9.0, scipy-0.14.0, joblib, 0.8.3-r1, nose 1.3.4.

benhachey commented 10 years ago

Hi. Please do send your example data. We'll have a quick look. On Oct 8, 2014 8:19 AM, "slbayer" notifications@github.com wrote:

I have a minimal case that seems to break the scorer (and I confess, I'm using the scorer for an entity clustering and linking evaluation which isn't TAC KBP).

Let's say your corpus contains two documents. Document 1 contains a gold mention, and document 2 contains no gold mentions; for the evaluated system, document 1 contains no mentions and document 2 contains one mention. This minimal case causes the scorer to fail as follows:

INFO Converting gold to evaluation format.. INFO Converting systems to evaluation format.. INFO Evaluating systems.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO Preparing summary report.. INFO Calculating confidence intervals.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO preparing strong_link_match report.. INFO preparing strong_nil_match report.. INFO preparing strong_all_match report.. INFO preparing strong_typed_link_match report.. INFO Preparing error report.. Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in

_run_code exec code in run_globals File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 60, in main() File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/ main.py", line 57, in main print(obj()) File "neleval/analyze.py", line 75, in call counts = Counter(error.label for error in _data()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 444, in init self.update(iterable, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 525, in update for elem in iterable: File "neleval/analyze.py", line 75, in counts = Counter(error.label for error in _data()) File "neleval/analyze.py", line 86, in iter_errors assert g.id == s.id AssertionError

I'd be happy to send you the minimal test I've set up, if you need it. I'd try to fix it myself, but I'm hoping that you'll be faster :-).

I'm working with the latest clone of the the repository. MacOS 10.9.5, Python 2.7.5, numpy-1.9.0, scipy-0.14.0, joblib, 0.8.3-r1, nose 1.3.4.

— Reply to this email directly or view it on GitHub https://github.com/wikilinks/neleval/issues/5.

slbayer commented 10 years ago

I've sent a zip file to your Gmail address.

There are two problems here.

One problem appears to be that you're assuming in iter_errors that Reader() is being used with group=by_document. The logic you have for iter_errors makes no sense if group=by_mention, because zip() isn't guaranteed to pair them correctly, at all. More to the point, I can't figure out why you'd even use by_mention here; it's the only place this grouping is used, and the logic doesn't seem to justify using it.

The other problem is that even when you remove by_mention, and group by document, you get an error if documents don't contain any annotations.

No matter how you do this, zip isn't going to cut it for you; you're going to have to collect all the indexes, generate MISSING for the documents/indexes that don't match at all, and THEN process the remaining pairs.

jnothman commented 10 years ago

This might have been something not properly fixed up when we moved it from handling AIDA-CoNLL-style data to TAC EDL-style data; analyze was ported in a rush. A setwise comparison should be straightforward. Thanks for the code review!

benhachey commented 10 years ago

Fixed in 46d9233.

Thanks again, @slbayer. Please let us know if you notice anything else.