openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

debugging UP through dictionaries #40

Open khaledJabr opened 5 years ago

khaledJabr commented 5 years ago

I have been working on debugging issues with UniversalPetrarch, mainly the issue of matching the dictionaries and the extracted patterns. @ahalterman (and @philip-schrodt ) suggested a way of doing so by tracking how does UP produce events through outputting the dictionary verbs and verb-patterns it matched. This method was used in debugging Petrarch2, and here is the relavent code snippets that does it (by @ahalterman )

Here's the code block I added to PETR2: this is the version of PETR2 with a file date of 28 June 2016
t1 = time.time()
               sentence = PETRtree.Sentence(treestr,SentenceText,Date)
               print(sentence.txt)
               coded_events , meta = sentence.get_events()  # this is the entry point into the processing in PETRtree
              # =========== new code starts here =======
              for k1, v1 in meta.items():
                   if k1 != 'nouns' and k1 != 'conv_code':
                       fwmp.write("\n" + str(k1) + '\n')
                       fwmp.write(SentenceID + '\n')                        
                       try:
                           fwmp.write(sentence.txt + '\n')
                       except:
                           fwmp.write("Sentence error\n")
                       for lst in v1:
# --                            fwmp.write("++ " + str(lst))
                           if "~" in lst:                            
                               fwmp.write("-- " + lst)
                           elif len(lst) > 1:
                               if "[" in lst[1]:
                                   fwmp.write("-- " + lst[0] + ": " + lst[1][:lst[1].find("[")].strip() + '\n')                            
                               else:
                                   fwmp.write("-- " + lst[0] + ": " + str(lst[1:]) + '\n')                            
                           else:
                               if lst[0]: fwmp.write("-- " + lst[0] + '\n')                                
               """if "conv_code" in meta:
                   fwmp.write(meta["conv_code"])"""  # used to figure out convert_code, which seems to be pretty innocuous
               if "comb_code" in sentence.metadata:
                   fwmp.write(sentence.metadata["comb_code"])
               # ===== new code ends here =========
              code_time = time.time()-t1
               if PETRglobals.NullVerbs or PETRglobals.NullActors:
                   event_dict[key]['meta'] = meta
                   event_dict[key]['text'] = sentence.txt (edited)
"fwmp" is the file where the patterns are written to, so it is open and closed elsewhere in the code
This code block is in "petrarch2.py"

I am having issues fitting this code to UP since it uses PETRgraph and it does not return a meta object. I would appreciate any help of how to tackle this.

JingL1014 commented 5 years ago

The sentence object in PETRgraph.py has an entry triplets that can be used for debugging.

Here is an example of sentence: The Syrian Observatory for Human Rights, a UK-based group that tracks the war, said eight people were killed in an air strike by government forces in a separate, rebel-held part of the city.

{'-#18#20#4': -->triplet_ ID {'transfermation': '~ a (b . ATTACK) SAY = a b 112\n', -->Transformation pattern matched if any 'meaning': 'KILL,KILL', --> block meaning 'verbcode': '190', 'triple': ('-', <PETRgraph.NounPhrase instance at 0x7f47fd9dc128>, <PETRgraph.VerbPhrase instance at 0x7f47fd9dacb0>), 'before_transfer': ([u'SYR'], ([u'---MIL'], [u'---PPL'], '190'), '010'), --> events involved in tranformation 'after_transfer': [([u'SYR'], [u'---MIL'], u'112')] -->event after transformation 'event': ([u'---MIL'], [u'---PPL'], '190'), -->event or event before transformation 'matched_txt': u'KILL'}, -->matched verb pattern or block meaning if only verb is matched }