openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.04k stars 417 forks source link

Scottish flats and addresses #350

Open dtraskas opened 6 years ago

dtraskas commented 6 years ago

Hi,

I tried your awesome library for parsing UK addresses and had good accuracy results so far however there are still problems with Scottish addresses, especially flats. I have noticed that you do not recommend training the models with our own data but instead contribute datasets so that you can potentially do that? Correct me if I am wrong with that but I am keen to improve the parser with more UK based data.

albarrentine commented 6 years ago

So expressions like "Flat 12", etc. are usually not part of building-level data sets like OSM. However, since it was important for the parser to handle sub-building information, for 1.0 we generated a variety of those types of expressions randomly per-language/per-country (including some Scotland-specific patterns like "TR" for Top Right, etc.) and append them to the base address. That said, there are still probably a number of real-world patterns that are missing in our data.

If there's a specific pattern it's not handling correctly, we can just generate the pattern. Can you provide a few examples of what's not working?