nusnlp / m2scorer

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.
GNU General Public License v2.0
149 stars 37 forks source link

How to generate "scorer's gold standard format" ? #10

Closed magician-david closed 6 years ago

magician-david commented 6 years ago

I can generate the format with the script in scripts/edit_creator.py if there is only one annotator. But what should I do if there are two or more annotated texts?

For example: Src: The cat sat at mat . The dog .

Gold1: The cat sat on the mat . The dogs .

Gold2: The cat sat on a mat . The dog .

How to get a file like example/source_gold ? Do I have to write a script?

shamilcm commented 6 years ago

You need to generate two separate M2 files using the edit_creator.py script and then combine the M2 files together by marking the annotator id at the end of each annotation line (starting with "A") in example/source_gold (e.g., 0 or 1). However, if any annotator does not have any annotation for a sentence, make sure you add the NOOP (no operation) annotation line for that annotator.

Example: if for a sentence annotator 0 does not have an annotation, add this line to the anntoation lines. A -1 -1|||noop||||||-NONE-|||-NONE-|||0

Unfortunately, there are no released scripts that does this. Also, note that generating edits using edit_creator.py is suboptimal compared to edits annotated manually by human annotators.