assessment of the quality of ERG analyses

arademaker commented 1 year ago

Executive summary:

Considering the spans that group predications and tokens for each sentence. In total, we have 1842193 such groups. In only 49793 of them, I found apparent POS inconsistency between ERG and the sense annotation.

49793/1842193 = 0.02

Note that I only consider the tokens that were sense tagged. If we count per sentence, 38883 sentences contain at least one error from a total of 159614 sentences. If we ignore the mismatches a/r (adverbs as adjectives) and q/n (someone), we have 28358 sentences with at least one error. If we also ignore mismatches caused by verb/adjective we have 17401 sentences:

38883/159614 = 0.24 28358/159614 = 0.17 17401/159614 = 0.11

The dataset contains 165994 sentences, but not all of them got a parse from ERG.

Details:

For all sentences, I join the tokens with the MRS predicates using the spans.

Below I found no conflict between ERG and the annotation. For instance, affect%2 means it was annotated as a verb, and ERG made it the predicate _affect_v_1. For hydrarthrosis, it was annotated as a noun, and ERG preprocessing instantiated a generic token from NNS pos tagger.

> START def hydrarthrosis affecting the knee
 (0,32) 0 => [('unknown', 0, 32, 'e2', 'h1', None), ('udef_q', 0, 32, 'q4', 'h5', None)]
 (0,13) 1 => [('hydrarthrosis', 0, 13, ['hydrarthrosis%1:26:00::'], ['wf'], 'NN'), ('_hydrarthrosis/nns_u_unknown', 0, 13, 'x4', 'h8', None)]
 (14,23) 1 => [('affecting', 14, 23, ['affect%2:29:00::'], ['wf'], 'VBG'), ('_affect_v_1', 14, 23, 'e9', 'h8', None)]
 (24,27) 1 => [('the', 24, 27, None, ['wf'], 'DT'), ('_the_q', 24, 27, 'q10', 'h11', None)]
 (28,32) 1 => [('knee', 28, 32, ['knee%1:08:00::'], ['wf'], 'NN'), ('_knee_n_1', 28, 32, 'x10', 'h14', None)]

Next, excess was annotated as an adjective (%5) but analysed as NOUN by ERG. See the line starting with “D>"

> START def an abnormality of pregnancy; accumulation of excess amniotic fluid
D> {'n', 'a'} [('excess', 45, 51, ['excess%5:00:00:unnecessary:00'], ['wf'], 'JJ'), ('udef_q', 45, 51, 'q29', 'h30', None), ('_excess_n_1', 45, 51, 'x29', 'h33', None)]
 (0,66) 0 => [('implicit_conj', 0, 66, 'e2', 'h1', None)]
  (0,28) 1 => [('unknown', 0, 28, 'e4', 'h1', None)]
   (0,2) 2 => [('an', 0, 2, None, ['wf'], 'DT'), ('_a_q', 0, 2, 'q6', 'h7', None)]
   (3,14) 2 => [('abnormality', 3, 14, None, ['wf'], 'NN'), ('_abnormality_n_1', 3, 14, 'x6', 'h10', None)]
   (15,17) 2 => [('of', 15, 17, None, ['wf'], 'IN'), ('_of_p', 15, 17, 'e11', 'h10', None)]
   (18,28) 2 => [('udef_q', 18, 28, 'q12', 'h13', None)]
    (18,27) 3 => [('pregnancy', 18, 27, ['pregnancy%1:26:00::'], ['wf'], 'NN'), ('_pregnancy_n_1', 18, 27, 'x12', 'h16', None)]
    (27,28) 3 => [(';', 27, 28, None, ['wf'], 'punc')]
  (29,66) 1 => [('unknown', 29, 66, 'e5', 'h1', None), ('udef_q', 29, 66, 'q17', 'h18', None)]
   (29,41) 2 => [('accumulation', 29, 41, ['accumulation%1:22:00::'], ['wf'], 'NN'), ('_accumulation_n_of', 29, 41, 'x17', 'h21', None)]
   (42,44) 2 => [('of', 42, 44, None, ['wf'], 'IN')]
   (45,66) 2 => [('udef_q', 45, 66, 'q22', 'h23', None)]
    (45,60) 3 => [('compound', 45, 60, 'e27', 'h26', None)]
     (45,51) 4 => [('excess', 45, 51, ['excess%5:00:00:unnecessary:00'], ['wf'], 'JJ'), ('udef_q', 45, 51, 'q29', 'h30', None), ('_excess_n_1', 45, 51, 'x29', 'h33', None)]
     (52,60) 4 => [('amniotic', 52, 60, None, ['cf', 'a'], 'JJ'), ('_amniotic/jj_u_unknown', 52, 60, 'e28', 'h26', None)]
    (61,66) 3 => [('fluid', 61, 66, None, ['cf', 'a'], 'NN'), ('_fluid_n_1', 61, 66, 'x22', 'h26', None)]

ERG annotated adverbs and adjectives as adjoins, so another common mismatch is a vs r. The fragment after the first semi-colon should be an example "equally balanced”?

> START def a state of being essentially equal or equivalent; equally balanced; 
D> {'a', 'r'} [('essentially', 17, 28, ['essentially%4:02:01::'], ['wf'], 'RB'), ('_essential_a_1', 17, 28, 'e17', 'h16', None)]
D> {'n', 'a'} [('equivalent', 38, 48, ['equivalent%1:09:00::'], ['wf'], 'JJ'), ('_equivalent_a_to', 38, 48, 'e22', 'h16', None)]
 (0,67) 0 => [('implicit_conj', 0, 67, 'e2', 'h1', None)]
  (0,49) 1 => [('unknown', 0, 49, 'e4', 'h1', None)]
   (0,1) 2 => [('a', 0, 1, None, ['wf'], 'DT'), ('_a_q', 0, 1, 'q6', 'h7', None)]
   (2,7) 2 => [('state', 2, 7, ['state%1:03:00::'], ['wf'], 'NN'), ('_state_n_of', 2, 7, 'x6', 'h10', None)]
   (8,10) 2 => [('of', 8, 10, None, ['wf'], 'IN')]
   (11,49) 2 => [('udef_q', 11, 49, 'q11', 'h12', None), ('nominalization', 11, 49, 'x11', 'h15', None)]
    (11,16) 3 => [('being', 11, 16, None, ['wf'], 'VBG')]
    (17,28) 3 => [('essentially', 17, 28, ['essentially%4:02:01::'], ['wf'], 'RB'), ('_essential_a_1', 17, 28, 'e17', 'h16', None)]
    (29,34) 3 => [('equal', 29, 34, None, ['wf'], 'JJ'), ('_equal_a_to', 29, 34, 'e18', 'h16', None)]
    (35,37) 3 => [('or', 35, 37, None, ['wf'], 'CC'), ('_or_c', 35, 37, 'e21', 'h16', None)]
    (38,48) 3 => [('equivalent', 38, 48, ['equivalent%1:09:00::'], ['wf'], 'JJ'), ('_equivalent_a_to', 38, 48, 'e22', 'h16', None)]
    (48,49) 3 => [(';', 48, 49, None, ['wf'], 'punc')]
  (50,67) 1 => [('unknown', 50, 67, 'e5', 'h1', None)]
   (50,57) 2 => [('equally', 50, 57, None, ['wf'], 'RB'), ('_equal_a_to', 50, 57, 'e25', 'h1', None)]
   (58,66) 2 => [('balanced', 58, 66, ['balance%2:42:00::'], ['wf'], 'VBN'), ('_balance_v_1', 58, 66, 'e26', 'h1', None)]
   (66,67) 2 => [(';', 66, 67, None, ['wf'], 'punc')]

Adjective vs verb:

> START def the condition of being reinstated; 
D> {'v', 'a'} [('reinstated', 23, 33, ['reinstate%2:41:00::'], ['wf'], 'VBN'), ('_instate_v_1', 23, 33, 'e15', 'h14', None), ('_re-_a_again', 23, 33, 'e18', 'h14', None)]
 (0,34) 0 => [('unknown', 0, 34, 'e2', 'h1', None)]
  (0,3) 1 => [('the', 0, 3, None, ['wf'], 'DT'), ('_the_q', 0, 3, 'q4', 'h5', None)]
  (4,13) 1 => [('condition', 4, 13, ['condition%1:26:00::'], ['wf'], 'NN'), ('_condition_n_of', 4, 13, 'x4', 'h8', None)]
  (14,16) 1 => [('of', 14, 16, None, ['wf'], 'IN')]
  (17,34) 1 => [('udef_q', 17, 34, 'q9', 'h10', None), ('nominalization', 17, 34, 'x9', 'h13', None)]
   (17,22) 2 => [('being', 17, 22, None, ['wf'], 'VBG')]
   (23,33) 2 => [('reinstated', 23, 33, ['reinstate%2:41:00::'], ['wf'], 'VBN'), ('_instate_v_1', 23, 33, 'e15', 'h14', None), ('_re-_a_again', 23, 33, 'e18', 'h14', None)]
   (33,34) 2 => [(';', 33, 34, None, ['wf'], 'punc')]

Someone vs person+some_q. (1829 cases), I need to improve my check to remove this from the suspicious cases.

> START def a situation of being uncomfortably close to someone or something
D> {'a', 'r'} [('uncomfortably', 21, 34, ['uncomfortably%4:02:00::'], ['wf'], 'RB'), ('_uncomfortable_a_1', 21, 34, 'e16', 'h15', None)]
D> {'q', 'n'} [('someone', 44, 51, ['someone%1:03:00::'], ['wf'], 'NN'), ('person', 44, 51, 'x24', 'h23', None), ('_some_q', 44, 51, 'q24', 'h25', None)]
 (0,64) 0 => [('unknown', 0, 64, 'e2', 'h1', None)]
  (0,1) 1 => [('a', 0, 1, None, ['wf'], 'DT'), ('_a_q', 0, 1, 'q4', 'h5', None)]
  (2,11) 1 => [('situation', 2, 11, ['situation%1:15:00::'], ['wf'], 'NN'), ('_situation_n_1', 2, 11, 'x4', 'h8', None)]
  (12,14) 1 => [('of', 12, 14, None, ['wf'], 'IN'), ('_of_p', 12, 14, 'e9', 'h8', None)]
  (15,64) 1 => [('udef_q', 15, 64, 'q10', 'h11', None), ('nominalization', 15, 64, 'x10', 'h14', None)]
   (15,20) 2 => [('being', 15, 20, None, ['wf'], 'VBG')]
   (21,34) 2 => [('uncomfortably', 21, 34, ['uncomfortably%4:02:00::'], ['wf'], 'RB'), ('_uncomfortable_a_1', 21, 34, 'e16', 'h15', None)]
   (35,40) 2 => [('close', 35, 40, None, ['wf'], 'JJ'), ('_close_a_to', 35, 40, 'e17', 'h15', None)]
   (41,43) 2 => [('to', 41, 43, None, ['wf'], 'TO')]
   (44,64) 2 => [('udef_q', 44, 64, 'q19', 'h20', None)]
    (44,51) 3 => [('someone', 44, 51, ['someone%1:03:00::'], ['wf'], 'NN'), ('person', 44, 51, 'x24', 'h23', None), ('_some_q', 44, 51, 'q24', 'h25', None)]
    (52,54) 3 => [('or', 52, 54, None, ['wf'], 'CC'), ('_or_c', 52, 54, 'x19', 'h28', None)]
    (55,64) 3 => [('something', 55, 64, None, ['wf'], 'PRP'), ('thing', 55, 64, 'x29', 'h30', None), ('_some_q', 55, 64, 'q29', 'h31', None)]

What is especially below? Tagged as an adverb, in the ERG analysis, it is X?

> START def the relative position or standing of things or especially persons in a society; 
D> {'x', 'r'} [('especially', 47, 57, ['especially%4:02:01::'], ['wf'], 'RB'), ('_especially_x_deg', 47, 57, 'e35', 'h34', None)]
 (0,79) 0 => [('unknown', 0, 79, 'e2', 'h1', None), ('udef_q', 0, 79, 'q4', 'h5', None)]
  (0,3) 1 => [('the', 0, 3, None, ['wf'], 'DT'), ('_the_q', 0, 3, 'q9', 'h8', None)]
  (4,12) 1 => [('relative', 4, 12, ['relative%3:00:00::'], ['wf'], 'JJ'), ('_relative_a_to', 4, 12, 'e13', 'h12', None)]
  (13,21) 1 => [('position', 13, 21, None, ['wf'], 'NN'), ('udef_q', 13, 21, 'q16', 'h15', None), ('_position_n_of', 13, 21, 'x16', 'h19', None)]
  (22,33) 1 => [('udef_q', 22, 33, 'q21', 'h20', None)]
   (22,24) 2 => [('or', 22, 24, None, ['wf'], 'CC'), ('_or_c', 22, 24, 'x9', 'h12', None)]
   (25,33) 2 => [('standing', 25, 33, None, ['wf'], 'NN'), ('_standing_n_1', 25, 33, 'x21', 'h24', None)]
  (34,36) 1 => [('of', 34, 36, None, ['wf'], 'IN'), ('_of_p', 34, 36, 'e25', 'h12', None)]
  (37,43) 1 => [('things', 37, 43, ['thing%1:06:01::'], ['wf'], 'NNS'), ('udef_q', 37, 43, 'q26', 'h27', None), ('_thing_n_of-about', 37, 43, 'x26', 'h30', None)]
  (44,46) 1 => [('or', 44, 46, None, ['wf'], 'CC'), ('_or_c', 44, 46, 'x4', 'h32', None)]
  (47,57) 1 => [('especially', 47, 57, ['especially%4:02:01::'], ['wf'], 'RB'), ('_especially_x_deg', 47, 57, 'e35', 'h34', None)]
  (58,79) 1 => [('udef_q', 58, 79, 'q33', 'h34', None)]
   (58,65) 2 => [('persons', 58, 65, ['person%1:03:00::'], ['wf'], 'NNS'), ('_person_n_1', 58, 65, 'x33', 'h39', None)]
   (66,68) 2 => [('in', 66, 68, None, ['wf'], 'IN'), ('_in_p_loc', 66, 68, 'e40', 'h39', None)]
   (69,70) 2 => [('a', 69, 70, None, ['wf'], 'DT'), ('_a_q', 69, 70, 'q41', 'h42', None)]
   (71,78) 2 => [('society', 71, 78, None, ['wf'], 'NN'), ('_society_n_of', 71, 78, 'x41', 'h45', None)]
   (78,79) 2 => [(';', 78, 79, None, ['wf'], 'punc')]

arademaker commented 1 year ago

The mismatches can be grouped and quantified. Note that most cases are expected if we consider ERG semantics representation.

% grep "D>" validate.log | awk '{n = match($0,/}/); a = substr($0,3,n-1); print a}' | sort | uniq -c  | sort -nr
12968  {'a', 'v'}
12542  {'a', 'r'}
7419  {'n', 'a'}
6096  {'n', 'v'}
1870  {'n', 'q'}
1673  {'x', 'r'}
1488  {'c', 'r'}
1132  {'r', 'p'}
 884  {'a', 'p'}
 356  {'n', 'r'}
 225  {'a', 'q'}
 120  {'x', 'a'}
 114  {'n', 'p'}
  72  {'n', 'a', 'v'}
  45  {'r', 'v'}
  27  {'r', 'q'}
  22  {'n', 'a', 'q'}
  17  {'p', 'v'}
   7  {'n', None}
   4  {'n', 'x'}
   3  {'a', 'r', 'v'}
   2  {'x', 'v'}
   2  {'r', None}
   2  {'n', '1'}
   1  {'q', 'v'}
   1  {'n', 'c'}
   1  {'a', 'r', 'p'}

arademaker commented 1 year ago

From all the MRS that I obtained from the sentences that were parsed by ERG, 3748 did no pass in the validations from https://pydelphin.readthedocs.io/en/latest/api/delphin.mrs.html#module-functions

is_connected
has_intrinsic_variable_property
is_well_formed

Some cases below @danflick:

Example of not connected:

13947415-n-2 [the British are more aware of social status than Americans are] C:False IVP:True WF:False
[ TOP: h0
  INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
  RELS: < [ _the_q<0:3> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: pl IND: + ] RSTR: h5 BODY: h6 ]
          [ _british_n_1<4:11> LBL: h7 ARG0: x3 ]
          [ _more_x_comp<16:20> LBL: h1 ARG0: e8 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e2 ARG2: h9 ]
          [ _aware_a_of<21:26> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x10 [ x PERS: 3 NUM: sg ] ]
          [ udef_q<30:43> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
          [ _social_a_1<30:36> LBL: h14 ARG0: e15 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x10 ]
          [ _status_n_of-as<37:43> LBL: h14 ARG0: x10 ARG1: i16 ]
          [ udef_q<49:58> LBL: h17 ARG0: x18 [ x PERS: 3 NUM: pl IND: + ] RSTR: h19 BODY: h20 ]
          [ _american_n_1<49:58> LBL: h21 ARG0: x18 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h14 h19 qeq h21 > ]

but the 3th reading is

No intrinsic variable property and not well-formed (probably the error is e2 or x4):

14213328-n-1 [a curving or bending; often abnormal; ] C:True IVP:False WF:False
[ TOP: h0
  INDEX: e2 [ e SF: prop ]
  RELS: < [ unknown<0:37> LBL: h1 ARG: x4 [ x PERS: 3 IND: + ] ARG0: e2 ]
          [ _a_q<0:1> LBL: h5 ARG0: x4 RSTR: h6 BODY: h7 ]
          [ udef_q<2:9> LBL: h8 ARG0: x9 [ x PERS: 3 NUM: sg GEND: n ] RSTR: h10 BODY: h11 ]
          [ _curve_v_1<2:9> LBL: h12 ARG0: e13 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i14 ]
          [ nominalization<2:9> LBL: h15 ARG0: x9 ARG1: h12 ]
          [ udef_q<10:21> LBL: h16 ARG0: x17 [ x PERS: 3 NUM: sg GEND: n ] RSTR: h18 BODY: h19 ]
          [ _or_c<10:12> LBL: h20 ARG0: x4 ARG1: x9 ARG2: x17 ]
          [ _bend_v_1<13:20> LBL: h21 ARG0: e22 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i23 ARG2: i24 ]
          [ nominalization<13:20> LBL: h25 ARG0: x17 ARG1: h21 ]
          [ _often_a_1<22:27> LBL: h1 ARG0: i26 ARG1: e2 ]
          [ _abnormal_a_1<28:36> LBL: h1 ARG0: e2 ARG1: u27 ] >
  HCONS: < h0 qeq h1 h6 qeq h20 h10 qeq h15 h18 qeq h25 > ]

Using a recall procedure, I obtained a valid MRS for the majority of the 3748 cases above and inspect the next readings. In 77 cases, the valid MRS is the 6th reading. In 1686 cases, the second reading is valid etc.

[(5, 77), (3, 191), (1, 1686), (2, 686), (4, 176), (7, 34), (6, 52), (13, 12), (22, 4), (8, 29), (10, 19), (9, 17), 
(14, 13), (18, 7), (0, 36), (50, 1), (20, 6), (26, 5), (43, 1), (12, 10), (41, 3), (23, 2), (24, 4), (28, 3), (21, 4), 
(56, 2), (83, 1), (16, 7), (55, 2), (15, 8), (27, 5), (47, 3), (17, 2), (35, 1), (11, 9), (72, 2), (33, 1), (52, 2), (71, 1), 
(31, 4), (34, 4), (49, 1), (19, 3), (44, 2), (85, 1), (53, 1), (45, 1), (36, 1), (84, 1), (59, 2), (25, 1), (37, 1), (32, 1), 
(29, 2), (69, 1), (77, 1), (61, 1), (51, 1), (39, 1)]

fcbond commented 1 year ago

In cases where a derivational suffix (such as re, co, un) shifts the meaning, we need a verb with the suffix:

revisit (third-person singular simple present revisits, present participle revisiting, simple past and past participle revisited)

To visit again.
To reconsider or re-experience something.

own-pt / glosstag

assessment of the quality of ERG analyses #32

Executive summary:

Details: