add method to map feature extractors over lists of contexts (not just dicts)

chrishokamp commented 9 years ago

contexts_to_features currently only works with dicts, but a common usecase is to map a list of feature extractors over a list of contexts.

We should add a method called map_feature_extractors to do this, and move some of the code from contexts_to_features there.

varvara-l commented 9 years ago

You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?

I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.

chrishokamp commented 9 years ago

yes the indexing by tokens doesn't make sense for a lot of tasks -- we should extend the functionality in experiment_utils to support lists of contexts. for sequence labeling I think we need to support lists of lists of contexts because:

(1) each sequence is a list of contexts (2) a dataset is a list of sequences

another option is to add a field to the context object indicating its sequence_id and its index in the sequence -- i.e. { sequence_index: 1.3 } if its the third token in sequence 1. Then we would do post-processing to group the sequences together. This feels like it could get really messy and hackish though, so I prefer to support the list of lists approach.

The first step would be to create a parser that produces a list of lists object: sequences = [[token_context_0, token_context_1, ...], [token_context_0, token_context_1, ...]]

On Mon, Jan 19, 2015 at 11:53 AM, varvara-l notifications@github.com wrote:

You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?

I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.

— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/7#issuecomment-70482102.

varvara-l commented 9 years ago

Done in "simple" branch

qe-team / marmot

add method to map feature extractors over lists of contexts (not just dicts) #7