macabdul9 / CASA-Dialogue-Act-Classifier

PyTorch implementation of the paper "Dialogue Act Classification with Context-Aware Self-Attention" for dialogue act classification with a generic dataset class and PyTorch-Lightning trainer
MIT License
44 stars 13 forks source link

Suspicious labels #2

Closed glicerico closed 3 years ago

glicerico commented 3 years ago

Hey @macabdul9 , I realize you used the act_label_1 column in the SwDA data that you share in your repo as labels for training. That column doesn't seem particularly good as labels, as one can see from pairs obtained from the first rows in the test data: "Okay." - Other "I guess" - Info-request:Yes-No-Question "What kind of experience do you, do you have, then with child care ?" - Other:Segment-(multi-utterance) These classes don't match with the SwDA classes, I am not sure how they were obtained.

On the other hand, the column act_tag is not a good option either, as it contains 276 different classes. I think the data needs some cleaning.

macabdul9 commented 3 years ago

Hi @glicerico, can you create a PR with clean data?

glicerico commented 3 years ago

Working on it ;)