sunnweiwei / user-satisfaction-simulation

"Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems" in SIGIR'21
33 stars 4 forks source link

some questions about code and paper #3

Closed 652994331 closed 2 years ago

652994331 commented 2 years ago

Hi, thank you for this good work, we were trying to revive the model performance in the paper. there are some details that are confused. The paper mentioned that vocab size of 30566 were used in all model, as we know, Bert-base English uncased has a vocab size of 30566 and bert chinese has a vocab size of 21128. According to your dataset, JDDC is a Chinese dataset and others are English, so could you please clarify which pretrain models are used on these dataset separately? And can you share the evaluation accuracy on these dataset after the finetuning(satisfaction prediction task). thank you

sunnweiwei commented 2 years ago

Thanks for your attention!

  1. Pre-train models we used: JDDC = bert-base-chinese; Others = bert-base-uncased.
  2. The accuracy of BERT is 0.7414 (on JDDC), 0.8191 (on MultiWOZ), 0.7584 (on SGD), 0.7551 (on ReDial) 0.8090 (on CCPE); we chose the checkpoints based on the UAR (Unweighted Average Recall) metric.