openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

Arabic actor dictionary contains a mix of actor and agent codes #64

Closed philip-schrodt closed 5 years ago

philip-schrodt commented 5 years ago

The Arabic actor dictionary has lots of agent codes without country codes attached to them: these obviously should be in the agent dictionary. Also a lot of these have odd date restrictions, e.g. the most common (with a listing of various forms where it is found) is رئيس 6 [GOV] 2 [GOV 20030503-20130803] 1 [GOV 20050718-20091109] 1 [GOV 19981221-20140524] 1 [GOV 19760301-19880817] 1 [MWIGOV 20040524-20120405] 1 [000GOV 19700101-19700101]

This is "president" (rayiys) and for whatever reason is just showing up a lot as an isolated word (it's also in plenty of times as part of a title + name combination, which is fine)

ahalterman commented 5 years ago

These should be pretty easy to handle in an automated way. If several/most of the codes are agents codes, we can strip out the dates and add them to the agents file.

ahalterman commented 5 years ago

Closed by https://github.com/openeventdata/arabic_dictionaries/commit/b57d55bebc44d7bf9894a76036f5828b9368c741