ondrejklejch / MT-ComparEval

Tool for comparison and evaluation of machine translation.
Apache License 2.0
56 stars 14 forks source link

Confirmed vs. improving n-grams statistics #23

Closed martinpopel closed 9 years ago

martinpopel commented 9 years ago

The Sentence tab correctly distinguishes between confirmed n-grams and improving n-grams. (Improving n-grams are confirmed n-grams occurring in only one of the two systems being compared.)

See e.g. http://quest.ms.mff.cuni.cz:7280/tasks/205-207/compare#confirmed The panel tab with "Confirmed n-grams" currently shows statistics about top improving n-grams rather than confirmed. This means that one n-gram can be listed in both tables "systemA wins" and "systemB wins" (with different counts), which is somehow not intuitive to me. Also, as a consequence, the ten n-grams shown are almost always also the most frequent n-grams in general (because frequent n-grams are quite often improving systemA in one sentence and systemB in another sentence just by chance), so the table is not very informative.

Suggestion1

Just rename the current tables to "Improving n-grams" and "Worsening n-grams".

Suggestion2

Leave the name "Confirmed n-grams" (and similarly "Unconfirmed n-grams"), but compute the top n-grams in a different way. For each n-gram three numbers confA, confB, confDiff should be computed: confA = how many times was the n-gram seen as confirmed in systemA, confDiff = confA - confB. Table "systemA wins" will show top ten n-grams according to confDiff. Table "systemB wins" will show the bottom ten n-grams according to confDiff (with confDiff negative). See an example at http://ufal.mff.cuni.cz/~popel/compare_translations_sample.html (only the resulting number confDiff is shown here). To keep the table uncluttered I suggest to put "confA - confB = confDiff" in the tooltip for each row of the table, and show only confDiff in the row.

Suggestion3

Keep all four tabs (Improving, Worsening, Confirmed, Unconfirmed).

Suggestion4

Keep just two tabs (Confirmed, Unconfirmed), but include also "imprA - imprB = imprDiff" in a new column or tooltip (and similarly for worsA-worsB=worsDiff). Note that the current implementation shows imprA in the "systemA wins" table and imprB in the "systemB wins" table. Ideally both confDiff and imprDiff should be shown in two columns and by clicking the header, the table will be sorted according to the selected measure.

martinpopel commented 9 years ago

Suggestion2 was done in https://github.com/choko/MT-ComparEval/commit/8e48b0b6 (When creating this Issue, I didn't realize that imprDiff = confDiff, so Suggestions 3 and 4 are irrelevant.)