Hi, thank you for this good work, we were trying to revive the model performance in the paper. there are some details that are confused. The paper mentioned that vocab size of 30566 were used in all model, as we know, Bert-base English uncased has a vocab size of 30566 and bert chinese has a vocab size of 21128. According to your dataset, JDDC is a Chinese dataset and others are English, so could you please clarify which pretrain models are used on these dataset separately?
And can you share the evaluation accuracy on these dataset after the finetuning(satisfaction prediction task). thank you
Pre-train models we used: JDDC = bert-base-chinese; Others = bert-base-uncased.
The accuracy of BERT is 0.7414 (on JDDC), 0.8191 (on MultiWOZ), 0.7584 (on SGD), 0.7551 (on ReDial) 0.8090 (on CCPE); we chose the checkpoints based on the UAR (Unweighted Average Recall) metric.
Hi, thank you for this good work, we were trying to revive the model performance in the paper. there are some details that are confused. The paper mentioned that vocab size of 30566 were used in all model, as we know, Bert-base English uncased has a vocab size of 30566 and bert chinese has a vocab size of 21128. According to your dataset, JDDC is a Chinese dataset and others are English, so could you please clarify which pretrain models are used on these dataset separately? And can you share the evaluation accuracy on these dataset after the finetuning(satisfaction prediction task). thank you