rgcottrell / pytorch-human-performance-gec

A PyTorch implementation of "Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study"
Apache License 2.0
50 stars 19 forks source link

GLUE score #8

Closed NikhilCherian closed 4 years ago

NikhilCherian commented 4 years ago

@rgcottrell @cqlijingwei @tianfeichen

Thanks for the code and nice repo.

I have some doubts regarding generation python ../fairseq/generate.py\ ../test/jfleg\ --path ../checkpoints/lang-8-fairseq-cnn/checkpoint_best.pt\ --batch-size 128\ --raw-text\ --source-lang en\ --target-lang gec

I ran the above script for getting the GLue scores without the lang-model as I could not find it and got this:

Translated 747 sentences (15033 tokens) in 7.9s (94.45 sentences/s, 1900.78 tokens/s) | Generate test with beam=5: BLEU4 = 65.05, 84.0/69.9/59.6/51.1 (BP=1.000, ratio=1.004, syslen=14286, reflen=14226)

This result stays constant no matter if I change the beam length or not, BP is always 1.000. if we average them, it comes out to be 65.03. Could you help in understanding the GLUE score calculation?

tianfeichen commented 4 years ago

@NikhilCherian We used other people's implementation to calculate GLEU. It can be found at https://github.com/rgcottrell/pytorch-human-performance-gec/blob/master/fairseq-scripts/gleu.py

Beam does not get used when calculating GLEU. It is used for pre-processing and training.

NikhilCherian commented 4 years ago

Thanks for the clarification. I could get it through interactive.py. That was helpful.

image

NikhilCherian commented 4 years ago

@rgcottrell @tianfeichen @cqlijingwei Sorry for barging into with a silly question.

I am having trouble with the glue score on JFLEG dataset for evaluation?

As mentioned in generate.py for jfleg script. !python ../fairseq/generate_glue.py\ ../test/jfleg/\ --path ../checkpoints/lang-8-fairseq-cnn/checkpoint_best.pt\ --batch-size 128\ --beam 5\ --nbest 12\ --lang-model-data ../data-bin/wiki103\ --lang-model-path ../data-bin/wiki103/wiki103.pt\ --raw-text\ --ref ../test/test.en-gec.gec\ --src ../test/test.en-gec.en\ --hyp *

Here, we need --ref --src --hyp. what should be those arguments? image

--src cannot be inserted

Any help is appreciated. I am trying to extend the work. Thanks

tianfeichen commented 4 years ago

Are you running one of the .bat file under fairseq-scripts, or writing your own script?

In our setup, --source-lang and --target-lang are used for original English input and gamma corrected English output respectively.

--ref and --src were never used. They are actually commented out in own version of fairseq-scripots/generate.py

NikhilCherian commented 4 years ago

@rgcottrell @tianfeichen @cqlijingwei Thanks for the update. I was running the wrong generate.py from another generate.py.

But then I got these errors for not being able to access [g for g in gleu_scores]. I ran a tests on it and found it as a generator. It takes a lot of time to debug it.

image

Could you help me here? Thanks in advance.

tianfeichen commented 4 years ago

Are you trying to access the list using [g for g in gleu_scores][0][0]? You may need to debug the output from gleu_calculator.run_iterations().

NikhilCherian commented 4 years ago

Thanks. There was a error in the files. Solved it. Thanks a lot.