mideind / Tokenizer

A tokenizer for Icelandic text
Other
27 stars 6 forks source link

Support for unicode vulgar fractions (e.g. ⅔) #5

Closed sveinbjornt closed 5 years ago

sveinbjornt commented 5 years ago

Tokenizer now recognises unicode vulgar fractions as number tokens, both standalone (e.g '⅔') and as part of a longer number (e.g. '2½'). Also, added some degree abbreviations to Abbrev.conf.