marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.25k stars 233 forks source link

Rescoring? #80

Closed phikoehn closed 7 years ago

phikoehn commented 7 years ago

Does Amun support re-scoring?

What would be needed is the ability to provide an output along with the input to the decoder and have it report the model score for the given translations.

emjotde commented 7 years ago

That would be much easier in Marian. The forward training step is basically rescoring, just reporting total cost instead of per sentence cost. It would not require much work to do this: I think the validatior class (CrossEntropyValidator to be more precise) is a good starting point. Essentially put that into a separate binary and report back cost per sentence instead of per batch.

@tomekd , @snukky : any takers?

tomekd commented 7 years ago

I have mostly the rescorer done in my mosesplugin branch in amuNMT. I need only to merge with current master and add main() function.

Roman could do it in Marian-train.

emjotde commented 7 years ago

@phikoehn What format do you have in mind? N-best list? Simple pair of source and target?

phikoehn commented 7 years ago

My current need was to do the r2l rescoring - but it makes sense to keep complexity out of the toolkit and only have a general facility of providing source and target and requesting a score.

snukky commented 7 years ago

The rescorer is available in the current master of marian-train. It takes source and target corpora as input and prints sentence scores.

emjotde commented 7 years ago

Thanks @snukky, In the end it should be able to do moses-style n-best lists too. s2s and amun both generate these types of n-best lists.

emjotde commented 7 years ago

@phikoehn Can we close this?

phikoehn commented 7 years ago

Yes. this works now for me. Even get some gains (+.4 de-en, +.8 en-de)

emjotde commented 7 years ago

Great, closing this then. The n-best list can be another issue.