I am trying to reproduce the evaluation results, specifically for:
Table 3: Headline generation results for the three baseline models trained in the five data settings: English only (en), Hindi only (hi), Latin transliterated data (latin), Devanagari transliterated data (dvn.), and original script data (all). Only ROUGE-L scores are shown here. (See Appendix A.1 for more details.)
Table 13: Zero-shot performance of the best mT5 and Varta-T5 models on XL-Sum headline generation and abstractive summarization.
However, I am consistently getting slightly worse results compared to those reported in the paper, despite attempting to reflect the inference and evaluation specifications as stated.
Could you please release the code pipeline used to produce the results in the paper?
Hi,
I am trying to reproduce the evaluation results, specifically for:
However, I am consistently getting slightly worse results compared to those reported in the paper, despite attempting to reflect the inference and evaluation specifications as stated.
Could you please release the code pipeline used to produce the results in the paper?
Thank you!