Open keisks opened 6 years ago
The original M2 algorithm and implementation is not optimized to cases where the output is very different from the source sentence. We will try to release an optimized implementation soon. Meanwhile, if it is to validate an NMT system, you may try replacing the output sentence with the source sentence itself if the edit distance is very high. The M2 implementation within Moses tries to do something similar by avoiding extremely different sentences compared to the source.
Thank you for the suggestion and I look forward to the optimized version :)
m2scorer is relly slow when I evaluate my GEC data.
It takes hours just for evaluating 2000 short(<300) sentence.
By the way, I am using the official version of 3.2.
Is there a solution to this? I can not evaluate my GEC system although the testing data is 980 sentence with a maximum length of 433. It took more that 7 hours and still running.
I run the script on a PC with higher specifications and it finished running in less than 6 hours. It is too long but I added this note for those who might have the same problem.
Hello,
Thank you for developing M2 scorer!
I recently ran into a problem when I use m2 script for poor Neural MT outputs.
e.g., When I evaluated the following poor NMT output (for sentence id 333), the m2script takes very long time to compute. In my environment, it takes more than 5 hours and is still running...
Is this an expected behavior and is there a way to work around?
Thank you,