Open freyfogle opened 7 months ago
Eircodes were just starting to roll out when it was initially trained but there were very few examples available as most people were using the old system. In a future version I've thought about adding UK/Irish/Canadian/any other similar postcodes directly to the tokenizer since they follow regular patterns that are unambiguous with other types, and then the model can just treat them as a single token and handle within a handful of type features instead of one for every normalized postcode-word (saves space as well, and those don't require geographic context so could remove them from the postcode index - which is stored efficiently as a trie but still clocks in at about 500MB), though that would muck with the weights and require a parser retraining, which is not planned for the very near future, though there's some rearchitecting going on in the background.
This style of postcode only partially benefits from the classic NLP features that are used such as word shapes/digit masks because those would normalize to something like ["pDD" "ktDD"]. With enough training data that can work even without observing every possible postcode, but the data would need to capture every pattern sans digits (for the UK/Canada there were also training examples built off of a somewhat exhaustive list that then gets normalized to word/digit shapes).
One workaround is just to extract/remove with regex before parsing since they do follow regular patterns.
yes, we arrived at exactly the workaround you describe, just wanted to make sue you are aware that libpostal does not deal will with Eircodes.
Feel free to close the issue if you like
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
Ireland
Here's how I'm using libpostal
Parsing addresses
Here's what I did
Tried to parse Irish addresses including Eircodes (relatively new Irish postcode format)
Example:
Riverside House, Doneraile, P51 KT93, Ireland
Here's what I got
Here's what I was expecting
For parsing issues, please answer "yes" or "no" to all that apply.
Here's what I think could be improved
Eircodes are relatively new and only now coming into common use, especially for deliveries. They are not yet widely found in OpenStreetMap. Still, the format is easy to identify and the parser should be able to recognize them.