Suspicious labels - Githubissues

glicerico commented 3 years ago

Hey @macabdul9 , I realize you used the act_label_1 column in the SwDA data that you share in your repo as labels for training. That column doesn't seem particularly good as labels, as one can see from pairs obtained from the first rows in the test data: "Okay." - Other "I guess" - Info-request:Yes-No-Question "What kind of experience do you, do you have, then with child care ?" - Other:Segment-(multi-utterance) These classes don't match with the SwDA classes, I am not sure how they were obtained.

On the other hand, the column act_tag is not a good option either, as it contains 276 different classes. I think the data needs some cleaning.

macabdul9 commented 3 years ago

Hi @glicerico, can you create a PR with clean data?

glicerico commented 3 years ago

Working on it ;)

macabdul9 / CASA-Dialogue-Act-Classifier

Suspicious labels #2