Closed FilipStefaniuk closed 4 years ago
LCS algorithm is implemented incorrectly (hence the large rouge-L differences in #18 ). Function:
https://github.com/kavgan/ROUGE-2.0/blob/26092bd65f2cbf5e7ffbe2f23740bb95f819b063/src/com/rxnlp/tools/rouge/ROUGECalculator.java#L907 should not search for the longest common subsequence in a greedy way. The following example should have rouge-L equal to 1.0:
f.stefaniuk@AMDC3754:~/Documents/other/rouge2.0$ cat ./projects/test-lcs/reference/task1_ref1.txt token0 xxxx token1 token2 token3 token4 xxxx token5 f.stefaniuk@AMDC3754:~/Documents/other/rouge2.0$ diff ./projects/test-lcs/reference/task1_ref1.txt ./projects/test-lcs/system/task1_system1.txt f.stefaniuk@AMDC3754:~/Documents/other/rouge2.0$ java -jar ./target/rouge-calculator-1.2.1-shaded.jar ========Results Summary======= ROUGE-L+StopWordRemoval TASK1 SYSTEM1.TXT Average_R:0.66667 Average_P:0.66667 Average_F:0.66667 Num Reference Summaries:1 ======Results Summary End======
The correct results (using ROUGE-1.5.5 with pyrouge):
pyrouge_evaluate_plain_text_files -s ./projects/test-lcs/system/ -sfp "task(\d+)_system1.txt" -m ./projects/test-lcs/reference/ -mfp task#ID#_ref1.txt 2019-11-27 19:38:56,486 [MainThread ] [INFO ] Writing summaries. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Processing summaries. Saving system files to /tmp/tmpmalj10/system and model files to /tmp/tmpmalj10/model. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Processing files in ./projects/test-lcs/system/. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Processing task1_system1.txt. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Saved processed files to /tmp/tmpmalj10/system. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Processing files in ./projects/test-lcs/reference/. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Processing task1_ref1.txt. 2019-11-27 19:38:56,488 [MainThread ] [INFO ] Saved processed files to /tmp/tmpmalj10/model. 2019-11-27 19:38:56,489 [MainThread ] [INFO ] Written ROUGE configuration to /tmp/tmp7pU4tY/rouge_conf.xml 2019-11-27 19:38:56,489 [MainThread ] [INFO ] Running ROUGE with command /usr/local/lib/RELEASE-1.5.5/ROUGE-1.5.5.pl -e /usr/local/lib/RELEASE-1.5.5/data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -a -m /tmp/tmp7pU4tY/rouge_conf.xml --------------------------------------------- 1 ROUGE-1 Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-1 Average_P: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-1 Average_F: 1.00000 (95%-conf.int. 1.00000 - 1.00000) --------------------------------------------- 1 ROUGE-2 Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-2 Average_P: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-2 Average_F: 1.00000 (95%-conf.int. 1.00000 - 1.00000) --------------------------------------------- 1 ROUGE-3 Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-3 Average_P: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-3 Average_F: 1.00000 (95%-conf.int. 1.00000 - 1.00000) --------------------------------------------- 1 ROUGE-4 Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-4 Average_P: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-4 Average_F: 1.00000 (95%-conf.int. 1.00000 - 1.00000) --------------------------------------------- 1 ROUGE-L Average_R: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-L Average_P: 1.00000 (95%-conf.int. 1.00000 - 1.00000) 1 ROUGE-L Average_F: 1.00000 (95%-conf.int. 1.00000 - 1.00000) ---------------------------------------------
Please see fixed lcs in #23
LCS algorithm is implemented incorrectly (hence the large rouge-L differences in #18 ). Function:
https://github.com/kavgan/ROUGE-2.0/blob/26092bd65f2cbf5e7ffbe2f23740bb95f819b063/src/com/rxnlp/tools/rouge/ROUGECalculator.java#L907 should not search for the longest common subsequence in a greedy way. The following example should have rouge-L equal to 1.0:
The correct results (using ROUGE-1.5.5 with pyrouge):