nusnlp / m2scorer

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.
GNU General Public License v2.0
146 stars 36 forks source link

The edits for inserting multiple words are sometimes wrong #6

Closed kavehtp closed 8 years ago

kavehtp commented 8 years ago

For example:

S Thursday , is it not ? A 0 1|||UNK|||It 's|||REQUIRED|||-NONE-|||0 A 3 5|||UNK|||n't it|||REQUIRED|||-NONE-|||0

Target:

It 's Thursday , is n't it ?

tamhd commented 8 years ago

We get incorrect evaluation using M2Scorer as well (please double check).

Hypothesis:

It 's Thursday , is n't it ?

M2 file:

S Thursday , is it not ? A 0 0|||Mec|||It 's|||REQUIRED|||-NONE-|||0 A 3 5|||Mec|||n't it|||REQUIRED|||-NONE-|||0

Based on my understanding, the reason is the initialization step of the levenshtein matrix. It creates the first row using the index of the hypothesis, i.e: the necessary edits to transform an empty sentence into the hypothesis. I think the start and end of such edits should always be 0.

My proposed solution is to change line 827of file levenshtein.py into:

edit = ("ins", 0, 0, '', second[j-1], 0) # always insert at the beginning

It appears to me that the modification will fix the problem, yet I am not sure whether other errors may arise.

Tam

kavehtp commented 8 years ago

I tried it. The generated edits are correct with the change. Can you make a pull request?

tamhd commented 8 years ago

I tried it. The modification does not change the result of the 13 teams participating in CoNLL-2014 shared task (Table 7).

kavehtp commented 8 years ago

OK. Thanks.