LARD does not work well on switchboard datasets

wang304381190 commented 2 years ago

Hi Passali! I did some experiments using the method "Lard" and would like to ask some questions. I did three experiments:

finetune BERT using the synthetic dataset you provided from Zenodo to do disfluency detection(classify each token of the sequence as fluent or disfluent)
remove disfluency in Switchboard to get fluent text firstly, then use Lard to generate disfluency, finally finetune BERT using the synthetic dataset to do disfluency detection
finetune BERT with Switchboard dataset

results are followed:

It is observed that synthetic data with Lard did badly on benchmark Switchboard, is it because the disfluencies generated are not realistic?

tatianapassali commented 2 years ago

Dear @wang304381190,

Thank you for your interest in our method.

Regarding the performance on the LARD test set in the third experiment (Exp 3), we have tested this on our paper for binary disfluency detection and we reached 85.42, 54.52, and 19.6% accuracy for the repetition, replacement, and restart class respectively, which is significantly lower than the one that you have reported. Can you please check that you have tested this model on the right dataset?

Regarding the low performance on the Switchboard test set, this can be potentially attributed to the number of disfluencies that are contained within one sentence on Switchboard. While the proposed method can indeed generate valid disfluencies, Switchboard often contains more than one disfluency per sentence in contrast with the LARD method which generates only one disfluency per sentence. A solution to this might be to generate sentences that contain more than one disfluency (e.g. both repetitions and replacements).

Furthermore, please take into account that Switchboard has a highly imbalanced distribution of disfluencies with the vast majority of them being repetitions, thus the inserted synthetic disfluencies must follow the original imbalanced distribution.

wang304381190 commented 2 years ago

Thanks for your reply timely.

tatianapassali / artificial-disfluency-generation

LARD does not work well on switchboard datasets #1