openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

All objects coming off verb are incorrectly treated as target actors #27

Closed ahalterman closed 5 years ago

ahalterman commented 6 years ago

Any object that's not an nsubj relation coming off a verb gets automatically assigned to the target actor. This precludes any nouns from matching the direct objects in the verb dictionary entries (e.g. "flags" in "burned flags")). (code) This assigns objects that are part of the event to the target and will mean that no verb patterns containing objects will be coded.

JingL1014 commented 6 years ago

In the current UDpetrarch coder, the functionality for petrarch2 pattern matching is implemented as follows:

STEP1: The input of pattern matching function is source-action-target triplets extracted from dependency parsed tree. For example, given sentence "Arnor presented credentials to Ithilen's president on Wednesday", two triplets are generated "Arnor-presented-credentials", "Arnor-presented to Ithilen's president". I think this part should be improved. Right now the triplets extraction is basically based on the grammar rules, that is "subject" is the source, "object" and other noun modifiers are the target. But from the view of events, in the example sentence above, the source should be "Arnor", the target should be "Ithilen's president" and the action should be "presented credentials".
STEP2:In the pattern matching step, the first triplet "Arnor-presented-credentials" matches a pattern " - * CREDENTIALS", but the second triplet "Arnor-presented to Ithilen's president" only matches the verb "present", STEP3:In the postprocessing step, the code of each verb is checked and is replaced with the code of the pattern if any pattern is matched. So the code of verb "present" in the second triplet is replaced with the code of the pattern found in the first triplet.

If STEP1 extracts events correctly, then there is no need to have STEP3.

ahalterman commented 6 years ago

I'm still confused about how it can end up coding verb phrases that depend on direct objects in the verb (e.g. "launch airstrikes" vs "launch rescue mission"). Where is the code that implements Step 2, where it checks the dobj against the verb dictionaries to generate the appropriate code?

JingL1014 commented 6 years ago

The function for Step 2 is here. The problem now is dobj are considered as targets of the events. But in many cases (e.g. "launch airstrikes" vs "launch rescue mission"), dobj should be considered as part of the action.

ahalterman commented 6 years ago

I completely agree that dobjs should be part of the action. But I'm worried that they're not being treated as such. Consider the following (fictional) sentence:

Georgian demonstrators launched missiles at the Georgian government.

I just ran that through Petrarch2 and UniversalPetarch. Petrarch2 gives me the expected output:

GEOOPP  GEOGOV  194

When I run it through UniversalPetrarch, I get two events, neither of which is correct (and I suspect the dobj issue is partly to blame):

GEOOPP   ---  194
GEOOPP  GEOGOV  ---

If the dobjs aren't ever getting treated as part of the action, then all of the verb rules we've made won't ever be coded, and this would seem to explain why the Petr2-style verb dictionaries haven't been working.

ahalterman commented 6 years ago

One more example:

"Georgian demonstrators received ministers from the Georgian government."
"Georgian demonstrators received weapons from the Georgian government."

Petrarch2 codes as expected:

Event: 20080804 GEOOPP  GEOGOV  043 AFP0808020625_4 AFP
Event: 20080804 GEOGOV  GEOOPP  042 AFP0808020625_4 AFP
Event: 20080804 GEOOPP  GEOGOV  072 AFP082352_1 AFP

UniversalPetrarch has major problems (including with the actors, which it's normally good at):

Event: 20080804 GEOOPP  GEOGOV  --- AFP0808020625_4 AFP
Event: 20080804 GEOOPP  ---GOV  043 AFP0808020625_4 AFP
Event: 20080804 GEOOPP      072 AFP082352_1 AFP
JingL1014 commented 6 years ago

My earlier solution is to add STEP3 to handle the dobj issue. But recently I think it is better to improve STEP1, and in the current coder, I comment out the STEP3. Now we have annotated more English GSRs, I will start to fix this dobj issue.

PTB-OEDA commented 6 years ago

How we are handling this issue us the subject of discussion for tomorrow morning's meeting.

PTB

On Tue, May 15, 2018 at 2:15 PM, JingL1014 notifications@github.com wrote:

My earlier solution is to add STEP3 to handle the dobj issue. But recently I think it is better to improve STEP1, and in the current coder, I comment out the STEP3. Now we have annotated more English GSRs, I will start to fix this dobj issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/27#issuecomment-389281647, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1sCMPXDgl-KQmXrJUwAtg3y9Gvakks5tyylrgaJpZM4T-P2h .

-- Patrick T. Brandt Professor Political Science School of Economic, Political and Policy Sciences University of Texas at Dallas Personal site: http://www.utdallas.edu/~pbrandt MSBVAR site: http://yule.utdallas.edu

ahalterman commented 6 years ago

I just re-ran the most recent code and I'm still getting mistakes, though of a different sort.

"Georgian demonstrators received ministers from the Georgian government." Petr2 (correct):

Event: 20080804 GEOOPP GEOGOV 043 AFP0808020625_4 AFP 
Event: 20080804 GEOGOV GEOOPP 042 AFP0808020625_4 AFP

Old UDP (incorrect)

Event: 20080804 GEOOPP GEOGOV --- AFP0808020625_4 AFP 
Event: 20080804 GEOOPP ---GOV 043 AFP0808020625_4 AFP

New UDP (differently incorrect)

Event: 20080804 GEOOPP ---GOV 046 AFP0808020625_4 AFP
  1. It's missing the GEO in front of government
  2. It's now returning the wrong code ("Engage in negotiation") instead of the correct make a visit/host a visit

On another sentence it now works correctly:

"Georgian demonstrators received weapons from the Georgian government."

Now returns (correctly):

Event: 20080804 GEOOPP  GEOGOV  072 AFP082352_1 AFP

A third sentence is still incorrect:

"Georgian demonstrators received assurances from the Georgian government." returns

Event: 20080804 GEOOPP  --- 046 AFP082352_2 AFP

which is missing the target actor (GEOGOV).

I'm attaching the dependency parsed XML so you can test future code against it. 2_UDP_comp.txt

JingL1014 commented 6 years ago

i think you are using the PETR-1 dictionary instead of PETR-2 dictionary. You can modify the config here to choose to code using PETR-1 dictionary or PETR-2 dictionary.

If i use PETR-1 dictionary, i got above results. The pattern matched is - $ * + [046]

If I use PETR-2 dictionary, i got the following results:

sentence 1: Event: 20080804 GEOOPP ---GOV 043 AFP0808020625_4 AFP

sentence 2: Event: 20080804 GEOOPP GEOGOV 072 AFP082352_1 AFP

sentence 3: Event: 20080804 GEOOPP GEOGOV AFP082352_2 AFP the pattern matched is - * ASSURANCE [:030] # RECEIVE but since the sentence is not passive voice, it doesn't return 030.

I have a question about codes with format [code1:code2]. My understanding is if the sentence is active voice, then return code1, and if the sentence is passive voice, then return code2. Am i correct?

ahalterman commented 6 years ago

Thanks for the clarification. I didn't realize the default was Petrarch1 dictionaries now. Do you have an idea for how to fix the missing actor in sentence 1 and missing event code in sentence 3?

The colons don't indicate passive/active, they reflect two events being generated: the first sentence should generate an event that's GEOOPP hosting visitors from GEOGOV (043), and also an event that's GEOGOV sending visitors to GEOOPP (042).

JingL1014 commented 6 years ago

After fixing the [code1:code2] construction, now the outputs of those three sentences are: Sentence 1: Georgian demonstrators received ministers from the Georgian government

Event: 20080804 GEOOPP ---GOV 043 AFP0808020625_4 AFP Event: 20080804 ---GOV GEOOPP 042 AFP0808020625_4 AFP

The reason for "--GOV" is the coder find "minister" as target actor. In the dependency parsed tree, the noun phrase "ministers" and prepositional phrase "from the Georgian government" are attached to verb "received". The coder checks the object first for target actors (in this case, "ministers"). If no target actors are found, it checks the rest noun modifiers (in this case, "from the Georgian government")

Sentence 2:

Event: 20080804 GEOOPP GEOGOV 072 AFP082352_1 AFP

Sentence 3: Event: 20080804 GEOGOV GEOOPP 030 AFP082352_2 AFP

ahalterman commented 6 years ago

I think the problem I'm describing still persists. Consider the sentence

Georgian demonstrators received warplanes from the Georgian government.

When I run that through, I get ---MIL GEOOPP 072, when it should have GEOGOV as the actor that's sending the military aid. Many, many verb entries have actors or agents in them, so UDP needs to be able to distinguish between parts of an event and the target actor.

If I make it active, (The Georgian government sent warplanes to the Georgian opposition.) I get a blank event: --- ---

JingL1014 commented 6 years ago

I re-run the sentence The Georgian government sent warplanes to the Georgian opposition. I think the reason for the blank event is that UDpipe generates a wrong dependency parsed tree. Could you give me the parsed tree you used to generate the blank event? If i use the correct parsed tree. The event I got is [[u'GEOGOV'], [u'---MIL'], u'166']. (pattern - * &AIRCRAFT appears twice in the dictionary, one with code [166], the other with code [072]. )

For the ---MIL, ---GOV issue, now when the coder find actors, it checks core arguments (the full subject, object noun phrases including adjective and prepositional modifiers) of the verb first and then other nominal dependents. Many sentences in the validation sets have actors in those core arguments. But the sentences above have actors in the nominal dependents. I can first try to check if the core argument is part of the matched pattern, then it will not be considered as actors. I think it is difficult to just use one general rule to handle all cases. Maybe we can use machine learning method to learn the preference of actor location for different verbs.

The following are two sentences, one has target actor in core argument, and the other has target actor in nominal dependents.

Arnor on Thursday signed an 800 million ducat trade protocol for 1990 with Dagolath, its biggest trading partner, officials said. 
The entire object noun phrase is "an 800 million ducat trade protocol for 1990 with Dagolath" , so the target actor is "DAG"

1   Arnor   Arnor   PROPN   NNP Number=Sing 4   nsubj   _   _
2   on  on  ADP IN  _   3   case    _   _
3   Thursday    Thursday    PROPN   NNP Number=Sing 1   nmod    _   _
4   signed  sign    VERB    VBD Mood=Ind|Tense=Past|VerbForm=Fin    0   root    _   _
5   an  a   DET DT  Definite=Ind|PronType=Art   10  det _   _
6   800 800 NUM CD  NumType=Card    7   compound    _   _
7   million million NUM CD  NumType=Card    8   nummod  _   _
8   ducat   ducat   NOUN    NN  Number=Sing 10  compound    _   _
9   trade   trade   NOUN    NN  Number=Sing 10  compound    _   _
10  protocol    protocol    NOUN    NN  Number=Sing 4   dobj    _   _
11  for for ADP IN  _   12  case    _   _
12  1990    1990    NUM CD  NumType=Card    10  nmod    _   _
13  with    with    ADP IN  _   14  case    _   _
14  Dagolath    Dagolath    PROPN   NNP Number=Sing 12  nmod    _   _
15  ,   ,   PUNCT   ,   _   14  punct   _   _
16  its its PRON    PRP$    Gender=Neut|Number=Sing|Person=3|Poss=Yes|PronType=Prs  19  nmod:poss   _   _
17  biggest biggest ADJ JJS Degree=Sup  19  amod    _   _
18  trading trading NOUN    NN  Number=Sing 19  compound    _   _
19  partner partner NOUN    NN  Number=Sing 14  appos   _   _
20  ,   ,   PUNCT   ,   _   4   punct   _   _
21  officials   official    NOUN    NNS Number=Plur 22  nsubj   _   _
22  said    say VERB    VBD Mood=Ind|Tense=Past|VerbForm=Fin    4   parataxis   _   _
23  .   .   PUNCT   .   _   4   punct   _   _

For the sentence "Georgian demonstrators received ministers from the Georgian government."
From the parsed tree, the full object noun phrase is "ministers". So the target actor is "---GOV"

1   Georgian    Georgian    ADJ JJ  Degree=Pos  2   amod    _   _
2   demonstrators   demonstrator    NOUN    NNS Number=Plur 3   nsubj   _   _
3   received    receive VERB    VBD Mood=Ind|Tense=Past|VerbForm=Fin    0   root    _   _
4   ministers   minister    NOUN    NNS Number=Plur 3   dobj    _   _
5   from    from    ADP IN  _   8   case    _   _
6   the the DET DT  Definite=Def|PronType=Art   8   det _   _
7   Georgian    Georgian    ADJ JJ  Degree=Pos  8   amod    _   _
8   government  government  NOUN    NN  Number=Sing 3   nmod    _   _
9   .   .   PUNCT   .   _   3   punct   _   _
JingL1014 commented 6 years ago

I add a heuristic to make sure the extracted actors should not be part of the matched pattern. Now the output of sentence 1 is

Event: 20080804 GEOOPP GEOGOV 043 AFP0808020625_4 AFP
Event: 20080804 GEOGOV GEOOPP 042 AFP0808020625_4 AFP
ahalterman commented 5 years ago

Can you add these sentences and their correct outputs to the test cases? Once that's done and the tests pass, we can close this issue.

JingL1014 commented 5 years ago

records are added in the test cases.