ng-j-p / rouge-we

ROUGE summarization evaluation metric, enhanced with use of Word Embeddings
MIT License
22 stars 12 forks source link

About pearson score #3

Open joewellhe opened 6 years ago

joewellhe commented 6 years ago

I read your paper "Better Summarization Evaluation with Word Embeddings for ROUGE". I'm very interested in your work. I try Rouge-score in the data the same with your, but the pearson score not good as your. e.g. pearson score of rouge2 with Pyr is 0.59 (computed by the matlab script provided by TAC) however, in your paper, this score is 0.96. Why you can get such a high score. If you do the pre-process in TAC data, Could you tell me how you do pre-process.

Lukecn1 commented 4 years ago

I have the exat same issue, I am not able to reproduce the high correlation scores between ROUGE and the human evaluations reported in the paper.

I get very similar scores to the one provided by OP.

Did you do any preprocessing and if so, is it possible to see this?