WordQERepresentationGenerator should take variable number of files

qe-team / marmot

MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation output

ISC License

22 stars 7 forks source link

WordQERepresentationGenerator should take variable number of files #31

Open varvara-l opened 9 years ago

varvara-l commented 9 years ago

We might already have some additional representations of the data, not just target, source and tags.

In that case the generator should allow parsing them as well.

Or vice versa, some representations can be missing (e.g. source), the generator should handle that as well. I think, only target and tags should be compulsory.

chrishokamp commented 9 years ago

I don't understand this, because the WMT 15 format consists of 3 files - 'source', 'target', and 'tags'. So this generator is specifically for that representation. Can you clarify what you mean?

On Fri, Feb 20, 2015 at 3:24 PM, varvara-l notifications@github.com wrote:

Assigned #31 https://github.com/qe-team/marmot/issues/31 to @chrishokamp https://github.com/chrishokamp.

— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/31#event-239182950.

varvara-l commented 9 years ago

If we save a file with POS tags from previous experiments, we may want to feed it to this representation generator without tagging the data again. So we should be able to specify in the init this file with an additional representation and the name of this representation.