qe-team / marmot

MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation output
ISC License
21 stars 7 forks source link

use string representations instead of lists for alignment features #24

Closed chrishokamp closed 9 years ago

chrishokamp commented 9 years ago

we can just join multi-token alignments with whitespace to form the string representation of an alignment. This will let us avoid using the multi-label binarizer, and will preserve the token order information.

varvara-l commented 9 years ago

We have to use multi-label binarizer anyway, for the cases when one word is aligned to 2 or more.

varvara-l commented 9 years ago

Fixed in https://github.com/qe-team/marmot/commit/4b88fc1e0349990481682694fda12f49a2db1bc9