Open cckozie opened 4 years ago
To elaborate on my comment in https://github.com/unfoldingWord/translationCore/issues/6237#issuecomment-552699174. If we use the first "did" in the example above, then we would have to also discard the first "they" since wordMAP only supports n-grams with contiguous tokens.
e.g. the phrase they(1) did(1)
is invalid because their appearance in the target text is discontinuous:
...servants did(1) what Yahweh commanded But they(1) did(2) it...
However, just using did(1)
would be valid because it does not include discontiguous tokens.
This is all forced by the current design of wordMAP, which is to only allow contiguous tokens. Supporting discontinuous tokens would open up more possibilities, but would also be more complex to implement.
2.1.0 (d45bc64) 6237.zip This is working by design but it does not conform to the requirement in #6237. (See that issue for details on design) This example was observed with only the attached two projects in tC.