openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

Add UP/dependencies white paper #51

Open ahalterman opened 5 years ago

ahalterman commented 5 years ago

It would be really useful as we're debugging UP on Arabic (and for users more generally) to have a white paper or algorithm describing how UP uses the dependency parse, similar to the Petrarch2 white paper. Some questions that have come up on Monday and before include:

PTB-OEDA commented 5 years ago

Agreed. We need a document that does this ASAP.

On Wed, Oct 24, 2018 at 12:46 PM Andy Halterman notifications@github.com wrote:

It would be really useful as we're debugging UP on Arabic (and for users more generally) to have a white paper or algorithm describing how UP uses the dependency parse, similar to the Petrarch2 white paper. Some questions that have come up on Monday and before include:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1r5-Sgd8Y3VYYmqj9GPteh-8CqlUks5uoKd-gaJpZM4X4dds .

-- Patrick T. Brandt Professor Political Science School of Economic, Political and Policy Sciences University of Texas at Dallas Personal site: http://www.utdallas.edu/~pbrandt MSBVAR site: http://yule.utdallas.edu

ahalterman commented 5 years ago

Just wanted to update this issue with the second round of documentation, along with my comments on it. Some questions to address:

ud_petrarch_documentation-AH.pdf

ahalterman commented 5 years ago

Updated documentation. I've marked the issues above that were resolved, but most are still outstanding.

The documentation addressed two issues (the al-Shabaab and "Gondor opposition" examples), describing very different behavior from the previous version of the documentation. I don't see any changes to the code, though. Has the code been updated to reflect the new behavior described in the documentation?

ud_petrarch_documentation_v3.pdf

PTB-OEDA commented 5 years ago

Agreed. This needs more documentation.

On Thu, Dec 6, 2018, 10:28 Andy Halterman <notifications@github.com wrote:

Updated documentation. I've marked the issues above that were resolved, but most are still outstanding.

The documentation addressed two issues (the al-Shabaab and "Gondor opposition" examples), describing very different behavior from the previous version of the documentation. I don't see any changes to the code, though. Has the code been updated to reflect the new behavior described in the documentation?

ud_petrarch_documentation_v3.pdf https://github.com/openeventdata/UniversalPetrarch/files/2653914/ud_petrarch_documentation_v3.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openeventdata/UniversalPetrarch/issues/51#issuecomment-444934590, or mute the thread https://github.com/notifications/unsubscribe-auth/AJrP1nkunkZNS1HLP3GhZ4eSncwXq7wxks5u2UXKgaJpZM4X4dds .

ahalterman commented 5 years ago

Or more importantly, more code!

ahalterman commented 5 years ago

I'll also emphasize that the “{Ukraine, ratified, agreement}” problem still requires comments (and major work on the coder). This is a fundamental problem to overcome.

philip-schrodt commented 5 years ago

Concur with Andy's comment: situations like that are why event coding is a problem distinct from the standard "event-triple" NLP issue (as numerous projects vastly larger and better funded than ours have learned over the past three decades to their dismay when they try to adapt generic NLP software to do event coding): depending on context, the target of an action can be in a variety of different places in the sentence/parse structure (the source is usually the subject, though depending on the clause structure of the sentence, sometimes not even that is true. But usually it is), and that's why verb dictionaries have 10K or so distinct patterns, and why outfits like Raytheon/BBN developed separate software just for event coding even though long ago they'd developed NLP software for just extracting triples.