Closed wang304381190 closed 2 years ago
Dear @wang304381190,
Thank you for your interest in our method.
Regarding the performance on the LARD test set in the third experiment (Exp 3), we have tested this on our paper for binary disfluency detection and we reached 85.42, 54.52, and 19.6% accuracy for the repetition, replacement, and restart class respectively, which is significantly lower than the one that you have reported. Can you please check that you have tested this model on the right dataset?
Regarding the low performance on the Switchboard test set, this can be potentially attributed to the number of disfluencies that are contained within one sentence on Switchboard. While the proposed method can indeed generate valid disfluencies, Switchboard often contains more than one disfluency per sentence in contrast with the LARD method which generates only one disfluency per sentence. A solution to this might be to generate sentences that contain more than one disfluency (e.g. both repetitions and replacements).
Furthermore, please take into account that Switchboard has a highly imbalanced distribution of disfluencies with the vast majority of them being repetitions, thus the inserted synthetic disfluencies must follow the original imbalanced distribution.
Thanks for your reply timely.
Hi Passali! I did some experiments using the method "Lard" and would like to ask some questions. I did three experiments:
results are followed:
It is observed that synthetic data with Lard did badly on benchmark Switchboard, is it because the disfluencies generated are not realistic?