The implementation of read_wmt_reference_data assumes that the terminologies appear in the same order in both source and target which is often not the case (not even for EN-FR).
This leads to wrong attribution of terminologies like in this example:
as the original implementation just uses the ids from the enumerate call, instead the correct source term needs to be extraced using the id field of the term tag as implemented.
In this branch I extract the correct src_term by term id.
The implementation of
read_wmt_reference_data
assumes that the terminologies appear in the same order in both source and target which is often not the case (not even for EN-FR).This leads to wrong attribution of terminologies like in this example:
as the original implementation just uses the
ids
from the enumerate call, instead the correct source term needs to be extraced using theid
field of theterm
tag as implemented.In this branch I extract the correct src_term by term id.