nusnlp / m2scorer

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.
GNU General Public License v2.0
149 stars 37 forks source link

How to generate "scorer's gold standard format" ? #10

Closed magician-david closed 6 years ago

magician-david commented 6 years ago

I can generate the format with the script in scripts/ if there is only one annotator. But what should I do if there are two or more annotated texts?

For example: Src: The cat sat at mat . The dog .

Gold1: The cat sat on the mat . The dogs .

Gold2: The cat sat on a mat . The dog .

How to get a file like example/source_gold ? Do I have to write a script?

shamilcm commented 6 years ago

You need to generate two separate M2 files using the script and then combine the M2 files together by marking the annotator id at the end of each annotation line (starting with "A") in example/source_gold (e.g., 0 or 1). However, if any annotator does not have any annotation for a sentence, make sure you add the NOOP (no operation) annotation line for that annotator.

Example: if for a sentence annotator 0 does not have an annotation, add this line to the anntoation lines. A -1 -1|||noop||||||-NONE-|||-NONE-|||0

Unfortunately, there are no released scripts that does this. Also, note that generating edits using is suboptimal compared to edits annotated manually by human annotators.