smartschat / cort

A toolkit for coreference resolution and error analysis.
MIT License
129 stars 34 forks source link

adjust_head_for_nam doesn't handle DURATION entities #6

Closed chardmeier closed 8 years ago

chardmeier commented 8 years ago

Hi, the adjust_head_for_nam function in cort.core.head_finders crashes whenever it encounters a named entity type of DURATION. This entity type is sometimes generated by the latest version of CoreNLP. I guess it shouldn't be too difficult to add a pattern for it, but I don't know what would make sense. /Christian

2016-04-09 05:04:42,285 INFO Preprocessing en/ep-00-06-15.xml.gz. 2016-04-09 05:20:08,227 INFO Extracting system mentions from en/ep-00-06-15.xml.gz. 2016-04-09 05:20:11,552 ERROR Discarding document en/ep-00-06-15.xml.gz 2016-04-09 05:20:11,619 ERROR Traceback (most recent call last): File "/home/staff/ch/PycharmProjects/cort/extra/annot-wmt.py", line 197, in doc.system_mentions = mention_extractor.extract_system_mentions(doc) File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_extractor.py", line 36, in extract_system_mentions for span in extract_system_mention_spans(document)] File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_extractor.py", line 36, in for span in extract_system_mention_spans(document)] File "/home/staff/ch/PycharmProjects/cort/cort/core/mentions.py", line 153, in from_document mention_property_computer.compute_head_information(attributes) File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_property_computer.py", line 248, in compute_head_information attributes["ner"][head_index]) File "/home/staff/ch/PycharmProjects/cort/cort/core/head_finders.py", line 214, in adjust_head_for_nam raise Exception("Unknown named entity annotation: " + ner_type) Exception: Unknown named entity annotation: DURATION

haripriya-b commented 8 years ago

Had the same issue. You will get a similar error for NUMBER as well. You will need to add an additional elif condition for DURATION and NUMBER in the if condition in line 196 in the cort/core/head_finders.py file.

chardmeier commented 8 years ago

Had the same issue. You will get a similar error for NUMBER as well. You will need to add an additional elif condition for DURATION and NUMBER in the if condition in line 196 in the cort/core/head_finders.py file.

Thanks, I suspected so! What were the patterns you added?

smartschat commented 8 years ago

Thank you for reporting the issue!

Another option would be to not throw an exception, but just give a warning and then use the unadjusted head (otherwise, the same issue will always occur when the NER software changes). I'll have a look at this ASAP.

smartschat commented 8 years ago

I've resolved the issue as described by me above. However, it would be great if we could additionally add rules for DURATION and NUMBER. If you have some well-working patterns, it would be great if you could open a pull request or share the patterns such that I can add them to cort.