Open Ningyu-y opened 3 weeks ago
I don't know where the problem is, I ran an experiment on the narrativeqa dataset, using the SBERT model and GPT3.5, on 276 questions in 10 articles, and the results of the experiment were very different from the data in the paper。
top-k | BLEU_1 | BLEU_4 | METEOR | ROUGE_L -- | -- | -- | -- | -- top5 | 11.07% | 3.26% | 16.01% | 16.01% top10 | 10.39% | 3.05% | 21.52% | 15.89% top20 | 11.79% | 3.64% | 22.75% | 17.33%
Can you share the script for the experiment? When I wrote my own script test, the results were very different from the data in the paper, and the results were far less good than the data in the paper