ondrejklejch / MT-ComparEval

Tool for comparison and evaluation of machine translation.
Apache License 2.0
56 stars 14 forks source link

Same y-range of non-paired bootstrap graphs #26

Open martinpopel opened 9 years ago

martinpopel commented 9 years ago

See e.g. the Statistics tab of http://quest.ms.mff.cuni.cz:7280/tasks/228-229/compare

On the first sight, the two bottom (non-paired) Bootstrap Resampling graphs of commercial1 and CU-Chimera look almost the same. However, the y-axis is different and the two confidence intervals are non-overlapping, because commercial1 is much worse than CU-Chimera.

Suggestion1

Keep the y-axis scale the same for both graphs.

Suggestion2

Merge the two graphs into one. Don't fill the area under curves, only make the yellow and blue curves thicker. Change also the title:

(Non-paired) Bootstrap Resampling: BLEU-cis commercial1: 95% confidence interval [0.0904, 0.0993] CU-Chimera: 95% confidence interval [0.1929, 0.2059]

Suggestion3

After Suggestion2, there will be a free space. We can use this space for (non-paired) sentence-level BLEU curves. So there will be again two curves (blue and yellow) in one graph, the two curves will be non-decreasing (the sentence-level BLEU scores will be sorted, independently for each system). This suggestion is a feature request, not really related to this issue.