openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.04k stars 417 forks source link

How to change or add new labels to parsed address #628

Open THENHKHAN opened 1 year ago

THENHKHAN commented 1 year ago

Hi!

I was checking out libpostal, and saw something that could be improved. Is anyone tried to add some more label or rename the existing label of parsed address?

Thanks in advance

albarrentine commented 7 months ago

Renaming an existing label can be done as a post-processing step in any programming language with a mapping dictionary. To change "house_number" to "building_number" in Python:

parsed = parse_address('A-16 Sector-63 Noida, Uttar Pradesh, 201303 India')
mappings = {'house_number': 'building_number'}
parsed = [(mappings.get(label, label), value) for label, value in parsed]

As far as adding new labels/examples, etc. the entire training data is freely available. There's some information in the README about how to download it from the Internet Archive and there are several issues in the history where people have trained country-specific models, etc. It's a fairly large undertaking to retrain the parser on the global dataset but you can if you want.