qe-team / marmot

MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation output
ISC License
21 stars 7 forks source link

issues and problems with the f1 metrics for evaluating word-level quality estimation #25

Open chrishokamp opened 9 years ago

chrishokamp commented 9 years ago

Put thoughts on metrics into this issue:

after discussion on 18.2.15:

chrishokamp commented 9 years ago

A metric similar to the BLEU score could make sense -- to measure the overlap between spans in the hypothesis and the reference. The key idea is that we do not discard a span if it is only a partial match, but its score does get penalized.

varvara-l commented 9 years ago
chrishokamp commented 9 years ago

20150218_200511

chrishokamp commented 9 years ago

quick rule of thumb -- if your class f1 measures sum to 1 or < 1, there's probably something bad going on, and you may not be learning anything.