openeventdata / petrarch2

Another next-generation event coding platform.
MIT License
71 stars 43 forks source link

Adding information to 'meta' when expanding cooperating compounds #11

Open philip-schrodt opened 8 years ago

philip-schrodt commented 8 years ago

When internal cooperation in compounds is being expanded in Sentence.get_events() [PETRtree.py], the new events aren't being added to the 'meta' storage of information, so consequently the routines for picking up the actor, event and actor-root texts don't have this information and instead just return '---' for all of the fields. Or rather do this because I've trapped this in a couple of places; otherwise the program crashes on a key-error due to the incompatibility of the primary event list and the information available in 'meta': I've inserted comments at the various points where this is relevant.

More generally, now that the actor/eventtext and actorroot options have been added, the 'meta' storage needs to be consolidated and refactored -- again, I've made a couple of notes on this.

The input file below will generate this issue:

The United States , United Kingdom and European Union have come down heavily on the violence and shrinking democratic space in Bangladesh and urged all parties to engage in dialogue . (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN come) (PRT (RP down)) (ADVP (RB heavily)) (PP (IN on) (NP (NP (DT the) (NN violence)) (CC and) (VP (VBG shrinking) (NP (JJ democratic) (NN space)) (PP (IN in) (NP (NNP Bangladesh)))))))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue)))))))) (. .))) The United States , United Kingdom and European Union have criticized Bangladesh and urged all parties to engage in dialogue (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN criticized) (NP (NNP Bangladesh)))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue) ))))))))) China , the US , South Africa , India , and Pakistan , who stockpiled their current net requirements , would now deplete their rubber in hand on releasing their rubber stocks to the market over the next few months . (ROOT (S (NP (NP (NP (NNP China)) (, ,) (NP (DT the) (NNP US)) (, ,) (NP (NNP South) (NNP Africa)) (, ,) (NP (NNP India)) (, ,) (CC and) (NP (NNP Pakistan))) (, ,) (SBAR (WHNP (WP who)) (S (VP (VBD stockpiled) (NP (PRP$ their) (JJ current) (JJ net) (NNS requirements))))) (, ,)) (VP (MD would) (ADVP (RB now)) (VP (VB deplete) (NP (PRP$ their) (NN rubber)) (PP (IN in) (NP (NN hand))) (PP (IN on) (S (VP (VBG releasing) (NP (PRP$ their) (NN rubber) (NNS stocks)) (PP (TO to) (NP (NP (DT the) (NN market)) (PP (IN over) (NP (DT the) (JJ next) (JJ few) (NNS months)))))))))) (. .)))

======= Event output ========== (actor/eventtext and actorroot == True)

20150823 CHN IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 ZAF CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 CHN USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 USA PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 USA ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 IND USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 PAK USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 ZAF IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 CHN PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 CHN ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 ZAF USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 PAK ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 IND ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 USA CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 IND PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 USA IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 ZAF PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 IND CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 PAK CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150823 PAK IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- --- 20150115 IGOEUREEC BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx European Union Bangladesh ... criticized THE EUROPEAN UNION BANGLADESH 20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 USA BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx The United States Bangladesh ... criticized UNITED STATES OF AMERICA BANGLADESH 20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- --- 20150115 GBR BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx United ... Kingdom Bangladesh ... criticized UNITED KINGDOM BANGLADESH 20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- --- 20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- --- 20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- --- 20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- --- 20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- --- 20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---

philip-schrodt commented 8 years ago

So GitHub dropped the XML markup in the input text: here's another try which preserves it as 'code'

<Sentences>

<Sentence date = "20150115" id ="0026f8d5-744c-4199-ae99-1ca9d160d8xx_1" source = "en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx" sentence = "True">
<Text>
The United States , United Kingdom and European Union have criticized Bangladesh and urged all parties to engage in dialogue
</Text>
<Parse>
(ROOT (S (NP (NP (DT The) 
(NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) 
(CC and) (NP (NNP European) (NNP Union))) 
(VP (VP (VBP have) (VP (VBN criticized) 
(NP (NNP Bangladesh)))) 
(CC and) 
(VP (VBD urged) 
(NP (DT all) (NNS parties)) 
(S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue)
)))))))))
</Parse>
</Sentence>

<Sentence date = "20150115" id ="0026f8d5-744c-4199-ae99-1ca9d160d877_1" source = "en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877" sentence = "True">
<Text>
The United States , United Kingdom and European Union have come down heavily on the violence and shrinking democratic
space in Bangladesh and urged all parties to engage in dialogue .
</Text>
<Parse>
(ROOT (S 
(NP (NP (DT The) (NNP United) (NNPS States)) 
(, ,) 
(NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) 
(VP 
(VP (VBP have) 
(VP (VBN come) (PRT (RP down)) (ADVP (RB heavily)) 
(PP (IN on) 
(NP (NP (DT the) (NN violence)) (CC and) 
(VP (VBG shrinking) (NP (JJ democratic) (NN space)) 
(PP (IN in) (NP (NNP Bangladesh)))))))) (CC and) 
(VP (VBD urged) (NP (DT all) (NNS parties)) (S 
(VP (TO to) 
(VP (VB engage) 
(PP (IN in) (NP (NN dialogue)))))))) 
(. .))) 
</Parse>
</Sentence>

<Sentence date = "20150823" id ="000dc436-5eea-4062-b807-43f5e2808d10_6" source = "en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10" sentence = "True">
<Text>
China , the US , South Africa , India , and Pakistan , who stockpiled their current net requirements ,
would now deplete their rubber in hand on releasing their rubber stocks to the market over the next few months .
</Text>
<Parse>
(ROOT (S 
(NP (NP (NP (NNP China)) (, ,) 
(NP (DT the) (NNP US)) (, ,) 
(NP (NNP South) (NNP Africa)) (, ,) 
(NP (NNP India)) (, ,) 
(CC and) 
(NP (NNP Pakistan))) (, ,) 
(SBAR (WHNP (WP who)) 
(S (VP (VBD stockpiled) 
(NP (PRP$ their) (JJ current) (JJ net) (NNS requirements))))) (, ,)) 
(VP (MD would) (ADVP (RB now)) 
(VP (VB deplete) 
(NP (PRP$ their) 
(NN rubber)) 
(PP (IN in) (NP (NN hand))) 
(PP (IN on) 
(S (VP (VBG releasing) 
(NP (PRP$ their) (NN rubber) (NNS stocks)) 
(PP (TO to) (NP (NP (DT the) (NN market)) 
(PP (IN over) (NP (DT the) (JJ next) (JJ few) (NNS months)))))))))) 
(. .)))
</Parse>
</Sentence>

</Sentences>