hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

Not lowercasing lemmata? #98

Open Jean-Baptiste-Camps opened 6 years ago

Jean-Baptiste-Camps commented 6 years ago

I've observed in several contexts (including with my own corpus) that case may be significant in lemmata, usually to distinguish proper nouns (for instance, Sarrasin for the name of the people or of M. Sarrasin, by opposition to 'sarrasin' as a kind of corn…). Could we not lowercase lemma during processing ? This might create a problem with generate approach, but not necessarily with label…

PonteIneptique commented 6 years ago

It could be a parameter. I would definitely go with that as a parameter that would be True by default.