mideind / Tokenizer

A tokenizer for Icelandic text
Other
27 stars 6 forks source link

Added token type for numbers with trailing letters #3

Closed sveinbjornt closed 6 years ago

vthorsteinsson commented 6 years ago

...and it would be useful to add tests for this, i.e. in test/test_tokenizer.py, checking the corner cases: "Laugavegur 100", "Laugavegur 100a", "Laugavegur 200þ", "Laugavegur 200milljónir", etc.

sveinbjornt commented 6 years ago

OK, made some fixes and worked around the units of measurement issue we discussed.

vthorsteinsson commented 6 years ago

Brilliant! Thanks for this. Expect to merge tomorrow after final review.