scoring poor NMT outputs

keisks commented 6 years ago

Hello,

Thank you for developing M2 scorer!

I recently ran into a problem when I use m2 script for poor Neural MT outputs.

e.g., When I evaluated the following poor NMT output (for sentence id 333), the m2script takes very long time to compute. In my environment, it takes more than 5 hours and is still running...

As it is a genetic risk , the patient force might have a high chance of carrying the risk , hence the need to inform their relatives is important . Hence , you are suffering from a genetic disease that the genetic trait might be passed on to your next generation if you have a child . Hence , there is no legal obligation to disclose to their family members , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there is no legal obligation . Hence , there

Is this an expected behavior and is there a way to work around?

Thank you,

shamilcm commented 6 years ago

The original M2 algorithm and implementation is not optimized to cases where the output is very different from the source sentence. We will try to release an optimized implementation soon. Meanwhile, if it is to validate an NMT system, you may try replacing the output sentence with the source sentence itself if the edit distance is very high. The M2 implementation within Moses tries to do something similar by avoiding extremely different sentences compared to the source.

keisks commented 6 years ago

Thank you for the suggestion and I look forward to the optimized version :)

shm007g commented 4 years ago

m2scorer is relly slow when I evaluate my GEC data.

It takes hours just for evaluating 2000 short(<300) sentence.

By the way, I am using the official version of 3.2.

amal-meer commented 3 years ago

Is there a solution to this? I can not evaluate my GEC system although the testing data is 980 sentence with a maximum length of 433. It took more that 7 hours and still running.

amal-meer commented 3 years ago

I run the script on a PC with higher specifications and it finished running in less than 6 hours. It is too long but I added this note for those who might have the same problem.

nusnlp / m2scorer

scoring poor NMT outputs #8