wyxstriker / ReweightingDisfluency

Implementation of COLING 2022 paper "Adaptive Unsupervised Self-training for Disfluency Detection"
7 stars 0 forks source link
disfluency-detection self-training unsupervised-learning

ReweightingDisfluency

This is the PyTorch implementation of the

About Model

We release our self-supervised model trained by pseudo data and grammar check model. Please download it in the following link, and put model in "./ckpt/teacher" and "./ckpt/judge " folder.

You need to put your data and model in the parallel folder of this repo:

    - ckpt/
        - electra_en_base
            - config.json
            - pytorch_model.bin
            - vocab.txt
        - teacher
            - pytorch_model.bin
        - judge
            - pytorch_model.bin
    - self_training/
        - run_data/
            - 500/
                - unlabel.tsv
                - dev.tsv
                - test.tsv
        - run_model/
    - src/
        - model.py
        ...
    - run.sh

About data

Due to copyright issues, we do not have the right to distribute the SWBD dataset and can purchase it for your own use.

Requirements

How to use

The file path and training details can be set in the script run.sh

nohup sh run.sh > log_run 2>&1 &

Citation

If you find this project useful for your research, please consider citing the following paper:

@inproceedings{wang2022adaptive,
  title={Adaptive Unsupervised Self-training for Disfluency Detection},
  author={Wang, Zhongyuan and Wang, Yixuan and Wang, Shaolei and Che, Wanxiang},
  booktitle={Proceedings of the 29th International Conference on Computational Linguistics},
  pages={7209--7218},
  year={2022}
}

Contact

If you have any question about this code, feel free to open an issue or contact yixuanwang@ir.hit.edu.cn.