Closed mt-empty closed 2 years ago
SacreBLEU does not filter out Syriac characters, but the score is 0 by definition if there is no matching 4-gram. In your example, the whole test set consists of a single sentence with three words only, so it is impossible to get non-zero BLEU with any translation (prediction). BLEU was designed as a corpus-level metric, expecting hundreds or thousands of sentences (of usual lengths, i.e. > 4) in the test set. If you need sentence-level metric, try e.g. chrF (also implemented in sacrebleu
and also wrapped in huggingface). If you really need sentence-level BLEU, you must configure smooth_value
(and smooth_method
), but if there is no matching 4-gram, most smoothing methods still result in zero score.
Thank you, I did not read the whole paper.
The maximum n-gram length is virtually always set to four, and since BLEU is corpus level, it is rare that there are any zero counts.
I'm building my own English to Syriac translation model using Huggingface libraries and sentencpiece tokenizer, and I'd like to use sacrebleu as my evaluation metric.
So I tried this:
And it results in:
I'm guessing just like bleu, sacrebleu is language independent, but I think its filtering out Syriac characters, hence why I'm getting a score of 0. I couldn't find a way to disable the filtering, and about the
--language-pair
option, Syriac doesn't have anISO 639-1
code, it only hasISO 639-2
code, which meansISO 639-2
languages aren't supported at all?Any solutions or alternatives?
Thank you