Confusing about the evaluation

snakeztc / NeuralDialog-CVAE

Tensorflow Implementation of Knowledge-Guided CVAE for dialog generation ACL 2017. It is released by Tiancheng Zhao (Tony) from Dialog Research Center, LTI, CMU

Apache License 2.0

309 stars 85 forks source link

@xubenben the final evaluation results should be computed against a test dataset with multiple ground truth reference responses. The current code shows not final results since it's only comparing to the test set with 1 reference response.

I recently uploaded the testing dataset with multiple references used in the paper in the data folder. https://github.com/snakeztc/NeuralDialog-CVAE/blob/master/data/test_multi_ref.json.

You can write a script using the same evaluation function to compute the F1 score against this dataset to obtain the final results.

snakeztc / NeuralDialog-CVAE

Confusing about the evaluation #3