Closed chrishokamp closed 9 years ago
You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?
I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.
yes the indexing by tokens doesn't make sense for a lot of tasks -- we should extend the functionality in experiment_utils to support lists of contexts. for sequence labeling I think we need to support lists of lists of contexts because:
(1) each sequence is a list of contexts (2) a dataset is a list of sequences
The first step would be to create a parser that produces a list of lists
object: sequences = [[token_context_0, token_context_1, ...], [token_context_0, token_context_1, ...]]
On Mon, Jan 19, 2015 at 11:53 AM, varvara-l notifications@github.com wrote:
You mean that contexts can be in the format other than {token1: [ ], token2: [ ], ...}?
I thought about that. Actually all our experiment utils work only with contexts organised in dicts. We should be able to store them in lists for sequence labelling.
— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/7#issuecomment-70482102.
Done in "simple" branch
contexts_to_features currently only works with dicts, but a common usecase is to map a list of feature extractors over a list of contexts.
We should add a method called
map_feature_extractors
to do this, and move some of the code fromcontexts_to_features
there.