wartaal / HanTa

The Hanover Tagger - A simple approach to lemmatization and POS-tagging of German morphology based on heuristics and hidden markov models
GNU Lesser General Public License v3.0
47 stars 2 forks source link

Abbreviations ABC -> Abc #8

Open H4rryK4ne opened 1 year ago

H4rryK4ne commented 1 year ago

I have a text with a lot of abbreviations (VT, KW od KW47, MWG, EKV, ...) and most (if not all) of them are tagged as NE and lemmatized as (Vt, Kw oder Kw47, Mwg, Ekv, ...).

wartaal commented 1 year ago

Yes, I know. This is not so easy to solve unless I would write 100 Heurisic Rules, since there are MANY POSSIBILITIES how capitalization can be used. If you know what kind of text you are dealing with some simple post-processing would be the best option, I think.