openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
3.99k stars 414 forks source link

The purpose of geodb and geonames #660

Open TarasovKiller opened 2 months ago

TarasovKiller commented 2 months ago

Hi, there!

Can you help me understand the purpose of the folders in ./libpostal/data/geodb and ./libpostal/data/geonames. They are empty and I found no clues as to what should be stored in them.

albarrentine commented 1 month ago

It's a defunct module which I don't think even creates any data directories post-1.0 so those might be from an older install. The idea was to do geo-disambiguation/place normalization in-library but in practice it's better to query a database and/or do a spelling correction step first. If I revive it at some point it would be focused around generating keys to lookup in a database. Geographic context is already implicitly used in the parser through the geo/postcode phrase features in the model, which do incorporate GeoNames as one of the data sources among others.