openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

Patterns containing multiple synsets are not being matched correctly #61

Open philip-schrodt opened 5 years ago

philip-schrodt commented 5 years ago

In AFP_SPA_19940921.0205_7.0, the verb pattern HACER &ARTICULO &DETENCION is matched, but the" &DETENCION" part does not occur in the sentence. This leads to another potential issue: HACER...&ARTICULO is a very common combination of words—the synset &ARTICULO just contains the Spanish articles [EL, LA, LAS, LOS, UN, UNO, UNA…] and consequently is likely to match inappropriately in a large number of cases: this might partially account for the high number of false positives we are currently seeing in Spanish but not in English or Arabic. More generally, there are 1579 patterns in CAMEO.spanish.verpatterns.181009.txt containing two or more synsets, so if this part of the code isn't working a lot of incorrect pattern matches are being generated.