mahfuzibnalam / terminology_evaluation

MIT License
21 stars 7 forks source link

bug fix terminology order difference between src and tar #7

Open FredericOdermatt opened 2 years ago

FredericOdermatt commented 2 years ago

The implementation of read_wmt_reference_data assumes that the terminologies appear in the same order in both source and target which is often not the case (not even for EN-FR).

This leads to wrong attribution of terminologies like in this example:

bug_data_loader

as the original implementation just uses the ids from the enumerate call, instead the correct source term needs to be extraced using the id field of the term tag as implemented.

In this branch I extract the correct src_term by term id.