qe-team / marmot

MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation output
ISC License
21 stars 7 forks source link

add lexical features #23

Open chrishokamp opened 9 years ago

chrishokamp commented 9 years ago

we don't have a feature extractor for the word itself, or for stemmed representations, suffixes, prefixes, etc...

these should be easy to implement

varvara-l commented 9 years ago

by the way, we have ngrams extractor: https://github.com/qe-team/marmot/blob/master/marmot/util/ngram_window_extractor.py Should I convert it to feature extractor? It can be used to extract the token itself, if you set window_size to 0.

chrishokamp commented 9 years ago

I think just wrap it from a feature extractor, since we already use it as a utility for other feature extractors in word-level.

On Wed, Feb 18, 2015 at 7:05 PM, varvara-l notifications@github.com wrote:

by the way, we have ngrams extractor: https://github.com/qe-team/marmot/blob/master/marmot/util/ngram_window_extractor.py Should I convert it to feature extractor? It can be used to extract the token itself, if you set window_size to 0.

— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/23#issuecomment-74925843.