Closed sultur closed 2 years ago
Is this good to merge, @vthorsteinsson ?
There are still some commented-out currency names (bresk pund, indónesískar rúpíur) in the tests, so I'm not sure whether this is fully tested and vetted yet. @sultur
I wasn't quite happy with this implementation, as it overwrites the default behaviour of the BÍN tokenizer (and caused a bunch of tests to fail). I am working on another one which isn't as drastic, instead having an optional flag to turn off the functionality of combining written numbers (and also fixes the kró bug). We can probably just close this pull request without merging it and I'll try to create the new pull request asap (which should be smaller).
Currently there is an issue with terminals of the form
{nationality} {currency}
(e.g. "breskra punda", "indónesískra rúpía") where their number/case is incorrectly identified. (Seetest_parse.py::test_amounts
).We may also want to add large numbers ("kvaðrilljarður" and upwards) to our BÍN vocabulary.