Replace SwDA data - Githubissues

macabdul9 / CASA-Dialogue-Act-Classifier

PyTorch implementation of the paper "Dialogue Act Classification with Context-Aware Self-Attention" for dialogue act classification with a generic dataset class and PyTorch-Lightning trainer

MIT License

44 stars 13 forks source link

Replace SwDA data #4

Closed glicerico closed 3 years ago

glicerico commented 3 years ago

This PR solves issue https://github.com/macabdul9/CASA-Dialogue-Act-Classifier/issues/2.

Replace SwDA data with cleaner data, which contains 43 Speech Acts as specified here (section 1c). Additionally, the new data was split into train, validation and test sets, following the split of a pair of influential papers on the subject, and specified here. The data was obtained from @cgpotts repository, but reorganized to follow the above-mentioned split.

glicerico commented 3 years ago

Oops, I forgot the data contains some NaN values that I removed in pre-processing for a different classifier. Let me clean those and re-commit

glicerico commented 3 years ago

Training runs now ;)