umcu / negation-detection

Negation detection in Dutch clinical text.
GNU General Public License v3.0
3 stars 0 forks source link

Map error categories #58

Closed lcreteig closed 2 years ago

lcreteig commented 2 years ago

There are two outstanding issues with the error analysis:

Here's the proposal for a final set of categories (with examples we can mention in the paper), and how these map to the existing categories (also available as error_categories.csv):

Table with categories biLSTM / RoBERTa | rule-based | final category | example | notes -- | -- | -- | -- | -- annotation error | annotation error | annotation error | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) negation of different term | negation of different term, no negation of noun | negation of different term | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged uncertainty | uncertainty, missing pseudo trigger | uncertainty | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") grammar |   | - |   | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") list | scope exceeded, list, missing termination trigger | scope exceeded | False positive: "no X, (but) [entity] detected" | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" hyphen | missing trigger | uncommon negation | False negative: "[entity]-", "neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing long distance |   | long distance | "[entity] is currently [...] not present" | does not occur in rule-based sentence structure | wrong modality | wrong modality | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") uncommon negation | missing trigger, missing variation | - |   |   punctuation | sentence splitting | punctuation | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative other | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | other |   |  
lcreteig commented 2 years ago

Additions/clarifications to table above:

error_categories_v2.csv

table v2 biLSTM / RoBERTa | rule-based | final category | example | notes -- | -- | -- | -- | -- annotation error | annotation error | **annotation error** | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) |   negation of different term | negation of different term, no negation of noun | **negation of different term** | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged uncertainty | uncertainty, missing pseudo trigger | **uncertainty** | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") grammar |   | - |   | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") List, distance/scope/list, long_distance, long distance | scope exceeded, list, missing termination trigger | **scope** | False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present” | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" uncommon, uncommon_negation, uncommon negation, Uncommon phrasing | missing trigger, missing variation | **uncommon negation** | False negative: ”neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing Hyphen, minus | missing trigger | **minus** | False negative: "[entity]-", |   sentence structure, sentence_structure, temporality, experiencer | wrong modality | **wrong modality** | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") punctuation | sentence splitting | **punctuation** | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative other, context | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | **other** |   |  
lcreteig commented 2 years ago

error_categories_v3.csv

biLSTM / RoBERTa rule-based final category example notes
annotation error annotation error annotation error False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171)  
negation of different term negation of different term, no negation of noun negation of different term False positive: "[entity] is not uncommon" difference between these two rule-based categories is unclear; should be merged
uncertainty uncertainty, missing pseudo trigger ambiguity “X without Y and [entity]” pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")
uncertainty uncertainty speculation False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")
grammar   -   category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation")
List, distance/scope/list, long_distance, long distance scope exceeded, list, missing termination trigger scope False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present” "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but"
uncommon, uncommon_negation, uncommon negation, Uncommon phrasing missing trigger, missing variation uncommon negation False negative: ”neither [entity] nor [entity]" a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing
Hyphen, minus missing trigger minus False negative: "[entity]-",  
sentence structure, sentence_structure, temporality, experiencer wrong modality wrong modality False positive: "hopefully no [entity] will form" (GP1729_162_169) "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan")
punctuation sentence splitting punctuation False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) In rule-based this can cause sentence to be split prematurely, causing a false negative
other, context missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error other