CHILDES-SRL

A corpus of semantic role labels auto-generated for 5M words of American-English child-directed speech.

Purpose

The purpose of this repository is to:

host the CHILDES-SRL corpus, and code to generate it, and
suggest recipes for training BERT on CHILDES-SRL for classifying token spans into semantic role arguments.

Inspiration and code for a BERT-based semantic role labeler comes from the AllenNLP toolkit. A SRL demo can be found here.

The code is for research purpose only.

Data

There are 2 manually annotated ("human-based") datasets, named after the year of their release:

data/pre_processed/human-based-2018_srl.txt
data/pre_processed/human-based-2008_srl.txt

The latter is an extended version of the former, which also includes SRL annotation for prepositions.

Further, this repository contains SRL labels generated by an automatic SRL tagger, applied to a custom corpus of approximately 5M words of American-English child-directed language, which can be found in data/pre_processed/childes-20191206_mlm.txt. The data file that contains both utterances and SRL annotation is in data/pre_processed/childes-20191206_srl.txt.

History

2008: The BabySRL project started as a collaboration between Cynthia Fisher, Dan Roth, Michael Connor and Yael Gertner, whose published work is available here.
2016: The most recent work, prior to this, can be found here
2019: Under the supervision of Cynthia Fisher at the Department of Psychology at UIUC, explorations into the ability of BERT to perform SRL tagging began. In particular, we experimented with joint training on SRL and MLM. The joint training procedure is similar to what is proposed in https://arxiv.org/pdf/1901.11504.pdf.
2020 (Summer): Having found little benefit for joint SRL and MLM training BERT on CHILDES, a new line of research into the grammatical capability of RoBERTa began. Development moved here.

Generating the CHILDES-SRL corpus

To annotate 5M words of child-directed speech using a semantic role tagger, trained by AllenNLP, execute data_tools/make_srl_training_data_from_model.py

To generate a corpus of human-labeled semantic role labels for a small section of CHILDES, execute data_tools/make_srl_training_data_from_human.py

Quality of auto-generated tags

How well does AllenNLP SRL tagger perform on CHILDES 2008 SRL data? Below is a list of f1 scores, comparing its performance with that of trained human annotators.

      ARG-A1 f1= 0.00
      ARG-A4 f1= 0.00
     ARG-LOC f1= 0.00
        ARG0 f1= 0.95
        ARG1 f1= 0.93
        ARG2 f1= 0.79
        ARG3 f1= 0.44
        ARG4 f1= 0.80
    ARGM-ADV f1= 0.70
    ARGM-CAU f1= 0.84
    ARGM-COM f1= 0.00
    ARGM-DIR f1= 0.48
    ARGM-DIS f1= 0.68
    ARGM-EXT f1= 0.38
    ARGM-GOL f1= 0.00
    ARGM-LOC f1= 0.68
    ARGM-MNR f1= 0.68
    ARGM-MOD f1= 0.78
    ARGM-NEG f1= 0.99
    ARGM-PNC f1= 0.03
    ARGM-PPR f1= 0.00
    ARGM-PRD f1= 0.15
    ARGM-PRP f1= 0.39
    ARGM-RCL f1= 0.00
    ARGM-REC f1= 0.00
    ARGM-TMP f1= 0.84
      ARGRG1 f1= 0.00
      R-ARG0 f1= 0.00
      R-ARG1 f1= 0.00
  R-ARGM-CAU f1= 0.00
  R-ARGM-LOC f1= 0.00
  R-ARGM-TMP f1= 0.00
     overall f1= 0.88

Compatibility

Tested on Ubuntu 16.04, Python 3.6, and torch==1.2.0

phueb / CHILDES-SRL

readme