Closed phikoehn closed 7 years ago
That would be much easier in Marian. The forward training step is basically rescoring, just reporting total cost instead of per sentence cost. It would not require much work to do this: I think the validatior class (CrossEntropyValidator to be more precise) is a good starting point. Essentially put that into a separate binary and report back cost per sentence instead of per batch.
@tomekd , @snukky : any takers?
I have mostly the rescorer done in my mosesplugin branch in amuNMT. I need only to merge with current master and add main() function.
Roman could do it in Marian-train.
@phikoehn What format do you have in mind? N-best list? Simple pair of source and target?
My current need was to do the r2l rescoring - but it makes sense to keep complexity out of the toolkit and only have a general facility of providing source and target and requesting a score.
The rescorer is available in the current master of marian-train. It takes source and target corpora as input and prints sentence scores.
Thanks @snukky, In the end it should be able to do moses-style n-best lists too. s2s and amun both generate these types of n-best lists.
@phikoehn Can we close this?
Yes. this works now for me. Even get some gains (+.4 de-en, +.8 en-de)
Great, closing this then. The n-best list can be another issue.
Does Amun support re-scoring?
What would be needed is the ability to provide an output along with the input to the decoder and have it report the model score for the given translations.