How to get the Rouge scores in the paper?

Bang is a great paper. But I have some problems when i try to get the scores in the paper. First, the BLUE-4 and Rouge-L in Squad question generation by Mass pretrained model is different with the paper with beam size as 5，and get BLEU4 = 22.43, Rouge as:

1 ROUGE-1 Average_R: 0.48431 (95%-conf.int. 0.47986 - 0.48875) 1 ROUGE-1 Average_P: 0.54315 (95%-conf.int. 0.53853 - 0.54740) 1 ROUGE-1 Average_F: 0.49817 (95%-conf.int. 0.49411 - 0.50238)

1 ROUGE-2 Average_R: 0.26775 (95%-conf.int. 0.26311 - 0.27234) 1 ROUGE-2 Average_P: 0.29883 (95%-conf.int. 0.29365 - 0.30367) 1 ROUGE-2 Average_F: 0.27436 (95%-conf.int. 0.26965 - 0.27884)

1 ROUGE-L Average_R: 0.44690 (95%-conf.int. 0.44248 - 0.45166) 1 ROUGE-L Average_P: 0.49998 (95%-conf.int. 0.49532 - 0.50436) 1 ROUGE-L Average_F: 0.45929 (95%-conf.int. 0.45507 - 0.46371) BLUE4 is higher than paper, but Rouge-L is lower than paper.

Second, I use Bang pretrained model and the code in this repo, but my rouge scores lags behind the scores in the paper – with a gap about 2. and the generative quality is poor： when dominicn choe was as to , singapore , he to . northern ireland ' s euro 2016 qualifier ireland was after a crash . prime may says she has faith ' in ' trident nuclear after afire a the bbc . tennis police is investigating by a williams a at . a coast has been that to oil coast a public bonfire park bonfire belfast has been up a of bonfire the bbc has strongly a claims thatguana iling iguana was ' .

microsoft / BANG