Closed lcreteig closed 2 years ago
Additions/clarifications to table above:
biLSTM / RoBERTa | rule-based | final category | example | notes |
---|---|---|---|---|
annotation error | annotation error | annotation error | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) | |
negation of different term | negation of different term, no negation of noun | negation of different term | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged |
uncertainty | uncertainty, missing pseudo trigger | ambiguity | “X without Y and [entity]” | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") |
uncertainty | uncertainty | speculation | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") |
grammar | - | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") | ||
List, distance/scope/list, long_distance, long distance | scope exceeded, list, missing termination trigger | scope | False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present” | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" |
uncommon, uncommon_negation, uncommon negation, Uncommon phrasing | missing trigger, missing variation | uncommon negation | False negative: ”neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing |
Hyphen, minus | missing trigger | minus | False negative: "[entity]-", | |
sentence structure, sentence_structure, temporality, experiencer | wrong modality | wrong modality | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") |
punctuation | sentence splitting | punctuation | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative |
other, context | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | other |
There are two outstanding issues with the error analysis:
Here's the proposal for a final set of categories (with examples we can mention in the paper), and how these map to the existing categories (also available as error_categories.csv):
Table with categories
biLSTM / RoBERTa | rule-based | final category | example | notes -- | -- | -- | -- | -- annotation error | annotation error | annotation error | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) negation of different term | negation of different term, no negation of noun | negation of different term | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged uncertainty | uncertainty, missing pseudo trigger | uncertainty | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") grammar | | - | | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") list | scope exceeded, list, missing termination trigger | scope exceeded | False positive: "no X, (but) [entity] detected" | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" hyphen | missing trigger | uncommon negation | False negative: "[entity]-", "neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing long distance | | long distance | "[entity] is currently [...] not present" | does not occur in rule-based sentence structure | wrong modality | wrong modality | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") uncommon negation | missing trigger, missing variation | - | | punctuation | sentence splitting | punctuation | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative other | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | other | |