This is temporal annotation project, tailored to the annotation of economic news. The annotation guidelines are many ways are a streamlining of TimeML, from which they draw deep inspiration.
This repository includes three things: the set-up files for annotation, a set of data annotated by three annotators, and the code needed to run our baseline classifier which was trained on the preliminary annotations.
The project set-up files for use with MAE, which includes:
The set of data annotated by three annotators can be found in raw_annotation_data, in subdirectories organized by annotator.
Code
The code for running our classifier should be run in the following order.
Converts the source csv to batched xml files for use in MAE:
Extract tags and their position information from MAE-generated XML files:
Add “unspecified” tags to untagged adjacent pairs:
Read from the processed tags to calculate IAA scores by tag type:
Select shared tags to create standard train/test datasets:
Obtain bag-of-words and tag features from standard datasets:
Implement a logistic regression model for link classification, train, and evaluate:
Datasets
annotation_data: all data from the news corpus
annotated_data:
raw_annotated_data: original files from annotators
silver.txt: all unique TLINK tags
features_silver.txt: feature vectors made from the silver dataset