Map error categories - Githubissues

umcu / negation-detection

Negation detection in Dutch clinical text.

GNU General Public License v3.0

3 stars 0 forks source link

There are two outstanding issues with the error analysis:

Rule-based errors were classified into more detailed categories
The definition of some categories were too vague

Here's the proposal for a final set of categories (with examples we can mention in the paper), and how these map to the existing categories (also available as error_categories.csv):

Table with categories

biLSTM / RoBERTa | rule-based | final category | example | notes -- | -- | -- | -- | -- annotation error | annotation error | annotation error | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) negation of different term | negation of different term, no negation of noun | negation of different term | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged uncertainty | uncertainty, missing pseudo trigger | uncertainty | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") grammar | | - | | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") list | scope exceeded, list, missing termination trigger | scope exceeded | False positive: "no X, (but) [entity] detected" | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" hyphen | missing trigger | uncommon negation | False negative: "[entity]-", "neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing long distance | | long distance | "[entity] is currently [...] not present" | does not occur in rule-based sentence structure | wrong modality | wrong modality | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") uncommon negation | missing trigger, missing variation | - | | punctuation | sentence splitting | punctuation | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative other | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | other | |

Additions/clarifications to table above:

"uncommon negation" is the preferred term for:
- "uncommon"
- "uncommon phrasing"
"minus" will be its own category (previously part of uncommon negation), and is the preferred term for:
- "hyphen"
"wrong modality" subsumes:
- "temporality"
- "experiencer"
"scope" is the preferred term for "distance/scope/list" and subsumes:
- "long distance"
- "list"
- "scope exceeded"

error_categories_v2.csv

table v2

biLSTM / RoBERTa | rule-based | final category | example | notes -- | -- | -- | -- | -- annotation error | annotation error | **annotation error** | False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171) | negation of different term | negation of different term, no negation of noun | **negation of different term** | False positive: "[entity] is not uncommon" | difference between these two rule-based categories is unclear; should be merged uncertainty | uncertainty, missing pseudo trigger | **uncertainty** | False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93) | pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke") grammar | | - | | category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation") List, distance/scope/list, long_distance, long distance | scope exceeded, list, missing termination trigger | **scope** | False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present” | "list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but" uncommon, uncommon_negation, uncommon negation, Uncommon phrasing | missing trigger, missing variation | **uncommon negation** | False negative: ”neither [entity] nor [entity]" | a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing Hyphen, minus | missing trigger | **minus** | False negative: "[entity]-", | sentence structure, sentence_structure, temporality, experiencer | wrong modality | **wrong modality** | False positive: "hopefully no [entity] will form" (GP1729_162_169) | "sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan") punctuation | sentence splitting | **punctuation** | False negative: "No evidence for recent traum. [entity]" (SP1857_92_101) | In rule-based this can cause sentence to be split prematurely, causing a false negative other, context | missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error | **other** | |

biLSTM / RoBERTa	rule-based	final category	example	notes
annotation error	annotation error	annotation error	False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171)
negation of different term	negation of different term, no negation of noun	negation of different term	False positive: "[entity] is not uncommon"	difference between these two rule-based categories is unclear; should be merged
uncertainty	uncertainty, missing pseudo trigger	ambiguity	“X without Y and [entity]”	pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")
uncertainty	uncertainty	speculation	False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93)	pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")
grammar		-		category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation")
List, distance/scope/list, long_distance, long distance	scope exceeded, list, missing termination trigger	scope	False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present”	"list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but"
uncommon, uncommon_negation, uncommon negation, Uncommon phrasing	missing trigger, missing variation	uncommon negation	False negative: ”neither [entity] nor [entity]"	a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing
Hyphen, minus	missing trigger	minus	False negative: "[entity]-",
sentence structure, sentence_structure, temporality, experiencer	wrong modality	wrong modality	False positive: "hopefully no [entity] will form" (GP1729_162_169)	"sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan")
punctuation	sentence splitting	punctuation	False negative: "No evidence for recent traum. [entity]" (SP1857_92_101)	In rule-based this can cause sentence to be split prematurely, causing a false negative
other, context	missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error	other

biLSTM / RoBERTa

rule-based

final category

example

notes

annotation error

annotation error

False negative: "indication for recent [entity]" (GP2796_56_63); False positive: "there is no sign of [entity]" (SP1201_156_171)

negation of different term

negation of different term, no negation of noun

negation of different term

False positive: "[entity] is not uncommon"

difference between these two rule-based categories is unclear; should be merged

uncertainty

uncertainty, missing pseudo trigger

ambiguity

“X without Y and [entity]”

pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")

uncertainty

speculation

False positive: "[entity] cannot be excluded"; False negative: "not typical for [entity]" (RD1158_84_93)

pseudo triggers contain actual negations, but also something else, e.g. "niet bijzonder", where "bijzonder" changes the meaning of "niet". Usually these designate uncertainty (e.g. "geen duidelijke")

grammar

category possibly too vague; these should be given another category (in biLSTM, these were mostly "punctuation")

List, distance/scope/list, long_distance, long distance

scope exceeded, list, missing termination trigger

scope

False positive: "no X, (but) [entity] detected”; "[entity] is currently [...] not present”

"list" is a special case of "scope exceeded", where the negation propagates too far (e.g. "no X, ENT"). Scope is most easily exceeded in lists, but not exclusively. Scope can be restricted in rule-based method by adding termination triggers such as "but"

uncommon, uncommon_negation, uncommon negation, Uncommon phrasing

missing trigger, missing variation

uncommon negation

False negative: ”neither [entity] nor [entity]"

a hyphen is a special case of "uncommon negation": a negation that does not get recognized as such. For rule-based, this is every negation trigger for which a rule is missing

Hyphen, minus

missing trigger

minus

False negative: "[entity]-",

sentence structure, sentence_structure, temporality, experiencer

wrong modality

wrong modality

False positive: "hopefully no [entity] will form" (GP1729_162_169)

"sentence structure" category possibly too vague; these should be given another category (in biLSTM, these were almost exclusively "wrong modality", e.g. temporality as in "hoop dat er geen ENT gaat ontstaan")

punctuation

sentence splitting

punctuation

False negative: "No evidence for recent traum. [entity]" (SP1857_92_101)

In rule-based this can cause sentence to be split prematurely, causing a false negative

other, context

missing direction, wrong direction, complex trigger, trigger overlaps with entity, termination trigger, spelling error

other

umcu / negation-detection

Map error categories #58