qe-team / marmot

MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation output
ISC License
21 stars 7 forks source link

add method to map feature extractors over lists of contexts (not just dicts) #7

Closed chrishokamp closed 9 years ago

chrishokamp commented 9 years ago

contexts_to_features currently only works with dicts, but a common usecase is to map a list of feature extractors over a list of contexts.

We should add a method called map_feature_extractors to do this, and move some of the code from contexts_to_features there.

varvara-l commented 9 years ago

You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?

I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.

chrishokamp commented 9 years ago

yes the indexing by tokens doesn't make sense for a lot of tasks -- we should extend the functionality in experiment_utils to support lists of contexts. for sequence labeling I think we need to support lists of lists of contexts because:

(1) each sequence is a list of contexts (2) a dataset is a list of sequences

The first step would be to create a parser that produces a list of lists object: sequences = [[token_context_0, token_context_1, ...], [token_context_0, token_context_1, ...]]

On Mon, Jan 19, 2015 at 11:53 AM, varvara-l notifications@github.com wrote:

You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?

I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.

— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/7#issuecomment-70482102.

varvara-l commented 9 years ago

Done in "simple" branch