Closed kho closed 11 years ago
Thanks for the heads up about this. I'm looking into it.
Found the problem. The code was not adding "derivation successors" to the vertex's priority queue for derivations that rederived the same string. You should not return derivations that derive the same string but it is important to add the derivation successors to the queue since otherwise you might miss part of the search space, as was happening here.
Extracting all k-best lists with bug | wc -l: 1948 Extracting all k-best lists and sort -u | wc -l: 5242 Extracting all k-best lists with fix | wc -l: 5242
Steps to reproduce the problem:
Expected output:
Actual output:
What's different
The 10-th best translation from cdec has a score of -37.9181; but there are two translations with higher scores that do not show up, namely,
As we can see from the output of forced decoding (seg 1 and 2), they are reachable in the LM pruned forest.