Closed slbayer closed 10 years ago
Hi. Please do send your example data. We'll have a quick look. On Oct 8, 2014 8:19 AM, "slbayer" notifications@github.com wrote:
I have a minimal case that seems to break the scorer (and I confess, I'm using the scorer for an entity clustering and linking evaluation which isn't TAC KBP).
Let's say your corpus contains two documents. Document 1 contains a gold mention, and document 2 contains no gold mentions; for the evaluated system, document 1 contains no mentions and document 2 contains one mention. This minimal case causes the scorer to fail as follows:
INFO Converting gold to evaluation format.. INFO Converting systems to evaluation format.. INFO Evaluating systems.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO Preparing summary report.. INFO Calculating confidence intervals.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO preparing strong_link_match report.. INFO preparing strong_nil_match report.. INFO preparing strong_all_match report.. INFO preparing strong_typed_link_match report.. INFO Preparing error report.. Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in
_run_code exec code in run_globals File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 60, in main() File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/ main.py", line 57, in main print(obj()) File "neleval/analyze.py", line 75, in call counts = Counter(error.label for error in _data()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 444, in init self.update(iterable, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 525, in update for elem in iterable: File "neleval/analyze.py", line 75, in counts = Counter(error.label for error in _data()) File "neleval/analyze.py", line 86, in iter_errors assert g.id == s.id AssertionError
I'd be happy to send you the minimal test I've set up, if you need it. I'd try to fix it myself, but I'm hoping that you'll be faster :-).
I'm working with the latest clone of the the repository. MacOS 10.9.5, Python 2.7.5, numpy-1.9.0, scipy-0.14.0, joblib, 0.8.3-r1, nose 1.3.4.
— Reply to this email directly or view it on GitHub https://github.com/wikilinks/neleval/issues/5.
I've sent a zip file to your Gmail address.
There are two problems here.
One problem appears to be that you're assuming in iter_errors that Reader() is being used with group=by_document. The logic you have for iter_errors makes no sense if group=by_mention, because zip() isn't guaranteed to pair them correctly, at all. More to the point, I can't figure out why you'd even use by_mention here; it's the only place this grouping is used, and the logic doesn't seem to justify using it.
The other problem is that even when you remove by_mention, and group by document, you get an error if documents don't contain any annotations.
No matter how you do this, zip isn't going to cut it for you; you're going to have to collect all the indexes, generate MISSING for the documents/indexes that don't match at all, and THEN process the remaining pairs.
This might have been something not properly fixed up when we moved it from handling AIDA-CoNLL-style data to TAC EDL-style data; analyze
was ported in a rush. A setwise comparison should be straightforward. Thanks for the code review!
Fixed in 46d9233.
Thanks again, @slbayer. Please let us know if you notice anything else.
I have a minimal case that seems to break the scorer (and I confess, I'm using the scorer for an entity clustering and linking evaluation which isn't TAC KBP).
Let's say your corpus contains two documents. Document 1 contains a gold mention, and document 2 contains no gold mentions; for the evaluated system, document 1 contains no mentions and document 2 contains one mention. This minimal case causes the scorer to fail as follows:
INFO Converting gold to evaluation format.. INFO Converting systems to evaluation format.. INFO Evaluating systems.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO Preparing summary report.. INFO Calculating confidence intervals.. neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to zero score for zero denominator StrictMetricWarning) INFO preparing strong_link_match report.. INFO preparing strong_nil_match report.. INFO preparing strong_all_match report.. INFO preparing strong_typed_link_match report.. INFO Preparing error report.. Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 60, in
main()
File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py", line 57, in main
print(obj())
File "neleval/analyze.py", line 75, in call
counts = Counter(error.label for error in _data())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 444, in init
self.update(iterable, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 525, in update
for elem in iterable:
File "neleval/analyze.py", line 75, in
counts = Counter(error.label for error in _data())
File "neleval/analyze.py", line 86, in iter_errors
assert g.id == s.id
AssertionError
I'd be happy to send you the minimal test I've set up, if you need it. I'd try to fix it myself, but I'm hoping that you'll be faster :-).
I'm working with the latest clone of the the repository. MacOS 10.9.5, Python 2.7.5, numpy-1.9.0, scipy-0.14.0, joblib, 0.8.3-r1, nose 1.3.4.