Hi, your work is very cool and i really like it. I have a question about your adopted evaluation metrics. Using compute_metrics.py, the Exact Matching(EM) and ROUGE-L scores are both reported. But only the ROUGE-L results are reported in the paper. I am confused about it. Could you give me some insights about it? Thanks.
Hi, your work is very cool and i really like it. I have a question about your adopted evaluation metrics. Using compute_metrics.py, the Exact Matching(EM) and ROUGE-L scores are both reported. But only the ROUGE-L results are reported in the paper. I am confused about it. Could you give me some insights about it? Thanks.
Results reported by compute_metrics.py
Results reported by the paper