sheetskristen / temporal-annotation

Linguistic annotation of events, time expressions and links between them for economic news.
0 stars 1 forks source link

Temporal Relation Annotation

This is temporal annotation project, tailored to the annotation of economic news. The annotation guidelines are many ways are a streamlining of TimeML, from which they draw deep inspiration.

This repository includes three things: the set-up files for annotation, a set of data annotated by three annotators, and the code needed to run our baseline classifier which was trained on the preliminary annotations.

The project set-up files for use with MAE, which includes:

The set of data annotated by three annotators can be found in raw_annotation_data, in subdirectories organized by annotator.

Code

The code for running our classifier should be run in the following order.

Converts the source csv to batched xml files for use in MAE:

Extract tags and their position information from MAE-generated XML files:

Add “unspecified” tags to untagged adjacent pairs:

Read from the processed tags to calculate IAA scores by tag type:

Select shared tags to create standard train/test datasets:

Obtain bag-of-words and tag features from standard datasets:

Implement a logistic regression model for link classification, train, and evaluate:

Datasets

annotation_data: all data from the news corpus

annotated_data:

raw_annotated_data: original files from annotators

silver.txt: all unique TLINK tags

features_silver.txt: feature vectors made from the silver dataset