openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

source/target coding bug in UP Arabic #45

Closed khaledJabr closed 5 years ago

khaledJabr commented 5 years ago

While further inspecting results produced by UP on Arabic data, I noticed a weird bug that I should have seen before but it was not so apparent when viewing results from terminal. The bug (or issue) is that UP produces a NULL/NaN value for some source/target entries. Ideally, I am expecting to have --- when UP does not find source/target entries. Here's a specific example

Parsed Sentence

<Sentence date="20000715" id="e6874d82b9c4d010348e7713f3214371c0e9c5443e48b7f78f572ea79ea2e0b4487406ca60fda58e92ee3ae466d7f77c85fd3876875bdd1f1349380756431bb6aa16e4a52882030fd88325f47fcdb256e98acc767133ad543c60e6459f981b188759ce4b2141f19f461e893147f33ce6650d489249d385fe1abf1cd144d0457f21d5756dedd150c8_42" sentence="True" source="unidentified">
<Text>
"اوقعت القرعة ""الاخضر"" في المجموعة الرابعة فتصدر ترتيبها برصيد 7 نقاط امام كوريا الجنوبية (4) واندونيسيا (3) والبحرين (3)."
</Text>
<Parse>1    "   "   PUNCT   G---------  _   2   punct   _   _
2   اوقعت   أوقع    VERB    VP-A-3FS--  Aspect=Perf|Gender=Fem|Number=Sing|Person=3|Voice=Act   0   root    _   _
3   القرعة  قرعة    NOUN    N------S1D  Case=Nom|Definite=Def|Number=Sing   2   obj _   _
4   ""  ""  PUNCT   G---------  _   2   punct   _   _
5   الاخضر  أخضر    NOUN    N------P1D  Case=Nom|Definite=Def|Number=Plur   2   dep _   _
6   ""  ""  PUNCT   G---------  _   2   punct   _   _
7   في  في  ADP P---------  AdpType=Prep    8   case    _   _
8   المجموعة    مجموعة  NOUN    N------S2D  Case=Gen|Definite=Def|Number=Sing   2   obl _   _
9   الرابعة رابع    ADJ A-----FS2D  Case=Gen|Definite=Def|Gender=Fem|Number=Sing    8   amod    _   _
10  ف   ف   CCONJ   C---------  _   11  cc  _   _
11  تصدر    صدر VERB    VIIA-3FS--  Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Person=3|VerbForm=Fin|Voice=Act  2   conj    _   _
12  ترتيب   ترتيب   NOUN    N------S1R  Case=Nom|Definite=Cons|Number=Sing  11  nsubj   _   _
13  ها  هو  PRON    SP---3FS2-  Case=Gen|Gender=Fem|Number=Sing|Person=3|PronType=Prs   12  nmod    _   _
14  ب   ب   ADP P---------  AdpType=Prep    15  case    _   _
15  رصيد    رصيد    NOUN    N------S2R  Case=Gen|Definite=Cons|Number=Sing  11  obl _   _
16  7   7   NUM Q---------  NumForm=Digit   11  obj _   _
17  نقاط    نقطة    NOUN    N------P2I  Case=Gen|Definite=Ind|Number=Plur   16  nmod    _   _
18  امام    أمام    ADP PI------4-  AdpType=Prep|Case=Acc   19  case    _   _
19  كوريا   كوريا   X   X---------  Foreign=Yes 11  advmod  _   _
20  الجنوبية    جنوبي   ADJ A-----FS2D  Case=Gen|Definite=Def|Gender=Fem|Number=Sing    19  amod    _   _
21  (   (   PUNCT   G---------  _   22  punct   _   _
22  4   4   NUM Q---------  NumForm=Digit   11  appos   _   _
23  )   )   PUNCT   G---------  _   22  punct   _   _
24  و   و   CCONJ   C---------  _   25  cc  _   _
25  اندونيسيا   إندونيسيا   X   X---------  Foreign=Yes 11  conj    _   _
26  (   (   PUNCT   G---------  _   27  punct   _   _
27  3   3   NUM Q---------  NumForm=Digit   25  appos   _   _
28  )   )   PUNCT   G---------  _   27  punct   _   _
29  و   و   CCONJ   C---------  _   30  cc  _   _
30  البحرين البحرين NOUN    N------S2D  Case=Gen|Definite=Def|Number=Sing   11  conj    _   _
31  (   (   PUNCT   G---------  _   32  punct   _   _
32  3   3   NUM Q---------  NumForm=Digit   30  nummod  _   _
33  )." )." PUNCT   G---------  _   32  punct   _   _
</Parse></Sentence>

output in evts. file generated by UP

20000715        KOR 061     e6874d82b9c4d010348e7713f3214371c0e9c5443e48b7f78f572ea79ea2e0b4487406ca60fda58e92ee3ae466d7f77c85fd3876875bdd1f1349380756431bb6aa16e4a52882030fd88325f47fcdb256e98acc767133ad543c60e6459f981b188759ce4b2141f19f461e893147f33ce6650d489249d385fe1abf1cd144d0457f21d5756dedd150c8_42 unidentified            صدر

Exact issue is column 1 is NaN image

Any thoughts on this ?

JingL1014 commented 5 years ago

The bug is fixed.

PTB-OEDA commented 5 years ago

What was needed to fixed this? We need to know for future work.

Thx

PTB

On Wed, Sep 19, 2018, 22:21 JingL1014 notifications@github.com wrote:

The bug is fixed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/45#issuecomment-423027464, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1jrMr93sXQGIBrgk6A4hvaoO3irbks5ucwm9gaJpZM4WrebI .

JingL1014 commented 5 years ago

This was just a small bug in the code. I implemented this functionality earlier but forgot to update the event dictionaries for the outputs.