Closed NikhilCherian closed 4 years ago
@NikhilCherian We used other people's implementation to calculate GLEU. It can be found at https://github.com/rgcottrell/pytorch-human-performance-gec/blob/master/fairseq-scripts/gleu.py
Beam does not get used when calculating GLEU. It is used for pre-processing and training.
Thanks for the clarification. I could get it through interactive.py. That was helpful.
@rgcottrell @tianfeichen @cqlijingwei Sorry for barging into with a silly question.
I am having trouble with the glue score on JFLEG dataset for evaluation?
As mentioned in generate.py for jfleg script. !python ../fairseq/generate_glue.py\ ../test/jfleg/\ --path ../checkpoints/lang-8-fairseq-cnn/checkpoint_best.pt\ --batch-size 128\ --beam 5\ --nbest 12\ --lang-model-data ../data-bin/wiki103\ --lang-model-path ../data-bin/wiki103/wiki103.pt\ --raw-text\ --ref ../test/test.en-gec.gec\ --src ../test/test.en-gec.en\ --hyp *
Here, we need --ref --src --hyp. what should be those arguments?
--src cannot be inserted
Any help is appreciated. I am trying to extend the work. Thanks
Are you running one of the .bat file under fairseq-scripts
, or writing your own script?
In our setup, --source-lang
and --target-lang
are used for original English input and gamma corrected English output respectively.
--ref
and --src
were never used. They are actually commented out in own version of fairseq-scripots/generate.py
@rgcottrell @tianfeichen @cqlijingwei Thanks for the update. I was running the wrong generate.py from another generate.py.
But then I got these errors for not being able to access [g for g in gleu_scores]. I ran a tests on it and found it as a generator. It takes a lot of time to debug it.
Could you help me here? Thanks in advance.
Are you trying to access the list using [g for g in gleu_scores][0][0]
? You may need to debug the output from gleu_calculator.run_iterations()
.
Thanks. There was a error in the files. Solved it. Thanks a lot.
@rgcottrell @cqlijingwei @tianfeichen
Thanks for the code and nice repo.
I have some doubts regarding generation python ../fairseq/generate.py\ ../test/jfleg\ --path ../checkpoints/lang-8-fairseq-cnn/checkpoint_best.pt\ --batch-size 128\ --raw-text\ --source-lang en\ --target-lang gec
I ran the above script for getting the GLue scores without the lang-model as I could not find it and got this:
Translated 747 sentences (15033 tokens) in 7.9s (94.45 sentences/s, 1900.78 tokens/s) | Generate test with beam=5: BLEU4 = 65.05, 84.0/69.9/59.6/51.1 (BP=1.000, ratio=1.004, syslen=14286, reflen=14226)
This result stays constant no matter if I change the beam length or not, BP is always 1.000. if we average them, it comes out to be 65.03. Could you help in understanding the GLUE score calculation?