snakeztc / NeuralDialog-CVAE

Tensorflow Implementation of Knowledge-Guided CVAE for dialog generation ACL 2017. It is released by Tiancheng Zhao (Tony) from Dialog Research Center, LTI, CMU
https://www.cs.cmu.edu/~tianchez/
Apache License 2.0
309 stars 85 forks source link

Confusing about the evaluation #3

Closed xubenben closed 6 years ago

xubenben commented 6 years ago

Hi,

I recently running your code, and get this after training with glove, I find it is confusing that the evaluation has no direct connection with results in your paper. Could you give a more detailed explain or more consist evaluation?

2017-11-16 10 14 26
snakeztc commented 6 years ago

@xubenben the final evaluation results should be computed against a test dataset with multiple ground truth reference responses. The current code shows not final results since it's only comparing to the test set with 1 reference response.

I recently uploaded the testing dataset with multiple references used in the paper in the data folder. https://github.com/snakeztc/NeuralDialog-CVAE/blob/master/data/test_multi_ref.json.

You can write a script using the same evaluation function to compute the F1 score against this dataset to obtain the final results.