Is this the same script with the Moses's multi-bleu.perl? I've seen that there are some modifications to the original version. I've been investigating that why my baseline model's (Google NIC with VGG-E) BLEU-2-3-4 performance is really low but what I've found is we are not using the same evaluation scripts. I know that this task is different than machine translation task, though. So, my questions are,
What's the intention behind the BLEU evaluation script modification?
Is all captioning people evaluate their models with this approach?
Hi,
Is this the same script with the Moses's multi-bleu.perl? I've seen that there are some modifications to the original version. I've been investigating that why my baseline model's (Google NIC with VGG-E) BLEU-2-3-4 performance is really low but what I've found is we are not using the same evaluation scripts. I know that this task is different than machine translation task, though. So, my questions are,
Thanks in advance.