ishalyminov / multitask_disfluency_detection

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon)

24 stars 2 forks source link

deep-learning dialog dialogue-systems disfluency language-model lstm multitask-learning natural-language-processing neural-network nlp sequence-labeling tensorflow

readme

Multitask disfluency detection

Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018 paper] [Slides]

Model architecture

Getting started

Set up the environment (below are steps for Conda):

$ cd code-directory
$ git submodule update --init
$ conda create -n multitask_disfluency python=2.7
$ conda activate multitask_disfluency
$ pip install -r requirements.txt

Preprocess the Switchboard dataset for training:

$ python make_deep_disfluency_dataset.py swbd disfluency

Train the model:

$ python train.py swbd model

bAbI+ disfluency study data generation

Get the bAbI tools and install requirements
Download bAbI dialog tasks into the babi_tools folder
Run sh make_generalization_study_datasets.sh <RESULT_FOLDER>
Run sh tag_dataset.sh <RESULT_FOLDER> <config_file_name> for every config in 2018_generalization_study_configs
The resulting datasets are <RESULT_FOLDER>/<BABI_DATASET_NAME>/*.tagged.json