ondrejklejch / MT-ComparEval

Tool for comparison and evaluation of machine translation.
Apache License 2.0
56 stars 14 forks source link

missing Paired Bootstrap Sampling plot #60

Closed nicolabertoldi closed 7 years ago

nicolabertoldi commented 8 years ago

I started to use your MT-ComparEval toolkit I installed it onto my machine

Almost everything works as expected but the statistics based on bootstrap sampling For ‘bleu-cis’ the plots are correct, whereas for all the other metrics, I got empty plots. By improving the log outputs in "./app/templates/Tasks/compare.latte” I discovered that the this variable “data.samples.data.length” is actually 0 (line 215); I suspect that something in the sampling went wrong.

Hence, I digged in the code and logs, and (I think) I understand that the sampling is done once each single task is loaded. I suspect that the statistics based on score differences (like the paired Bootstrap Sampling) exploits the sampling mentioned above; correct?

However it seems that the computation of this score difference is done only for ‘bleu-cis’
as I see in the watcher log "Generating BLEU-cis samples for system1.”

This is somehow confirmed by the config “app/config/config.neon”, where the only metric for which the flag compute_bootstrap is True is “blue-cis” whereas for the rest this flag is set to False.

I tried to activate this flag for all measures, but I did not see any change.

Just a final note which can be helpful. With respect to the demo at http://wmt.ufal.cz the list of available metrics differ In our version, instead, we have: brevity-penalty, bleu-cis, bleu, precision, recall, f1-measure (in this order which is the same of the config.neon file)

Nicola

ondrejklejch commented 8 years ago

Hi Nicola,

did you reimport your tasks? Unfortunately, MT-ComparEval can't compute new metrics once the task was imported. So, I suggest you to run this to reimport all tasks:

rm storage/database && sqlite3 storage/database < schema.sql
find data -name ".imported" | xargs rm
./bin/watcher.sh

Our demo at http://wmt.ufal.cz uses different configuration to use same names for BLEU and BLEU-cased as http://matrix.statmt.org.

Ondrej

nicolabertoldi commented 8 years ago

To be sure, I created a new experiments and load two tasks. But the problem is still there

ondrejklejch commented 8 years ago

Ok, I will look at it. Thank you.

ondrejklejch commented 8 years ago

Hi Nicola,

I found the problem. After changing the configuration file, cache has to be removed with: rm -rf temp/cache/*.

This is little bit annoying and it can be fixed by deleting this line: https://github.com/choko/MT-ComparEval/blob/master/app/bootstrap.php#L7 (from some reason we need that line for our deployments).

I will disccuss deletion of this line with @martinpopel.

Thank you very much again.

Ondrej

nicolabertoldi commented 8 years ago

after the removal of the temp directory everything works fine

thanks for your help