kiselev-dv / gazetteer

OSM ElasticSearch geocoder and addresses exporter
http://osm.me
Other
98 stars 21 forks source link

support parsing postcode from fullText address with libpostal #54

Open cordovapolymer opened 7 years ago

cordovapolymer commented 7 years ago

Hi, As the OpenStreetMap address data is not normalized db wide, I'm using libpostal parse_address to assign common labels. Could you modify the ADDR_LONG_TEXT to start with ISO country code or move the postcode to the end of the string, so libpostal would understand the postcode?

Here's an example how libpostal handles address data without and with the country code, and with the postcode in the end of the address

>>> from postal.parser import parse_address
>>> parse_address('SG19 3EP, 19-21, Church End, Gamlingay CP (S Cambs), South Cambridgeshire, Cambridgeshire')
[(u'sg19 3ep', u'suburb'), (u'19-21', u'house_number'), (u'church end', u'road'), (u'gamlingay cp s cambs', u'city'), (u'south cambridgeshire cambridgeshire', u'state_district')]

>>> parse_address('UK SG19 3EP, 19-21, Church End, Gamlingay CP (S Cambs), South Cambridgeshire, Cambridgeshire')
[(u'uk', u'country'), (u'sg19 3ep', u'postcode'), (u'19-21', u'house_number'), (u'church end', u'road'), (u'gamlingay cp s cambs', u'city'), (u'south cambridgeshire cambridgeshire', u'state_district')]

>>> parse_address('19-21, Church End, Gamlingay CP (S Cambs), South Cambridgeshire, Cambridgeshire, SG19 3EP')
[(u'19-21', u'house_number'), (u'church end', u'road'), (u'gamlingay cp s cambs', u'city'), (u'south cambridgeshire cambridgeshire', u'state_district'), (u'sg19 3ep', u'postcode')]
cordovapolymer commented 7 years ago

@kiselev-dv , I tried modifying the https://github.com/kiselev-dv/gazetteer/blob/6b2bd1c44b26389a468099867d6eacfb26a3bbcd/Gazetteer/src/main/java/me/osm/gazetter/out/AddrRowValueExctractorImpl.java#L175 my self, but it didn't change the position of the postcode. Can you advise me what to change to move the postcode to the end of the address?

kiselev-dv commented 7 years ago

Yes, but ping me closer to weekends, have some more urgent stuff on my table.

2017-05-10 19:13 GMT-03:00 cordovapolymer notifications@github.com:

@kiselev-dv https://github.com/kiselev-dv , I tried modifying the https://github.com/kiselev-dv/gazetteer/blob/ 6b2bd1c44b26389a468099867d6eacfb26a3bbcd/Gazetteer/src/main/ java/me/osm/gazetter/out/AddrRowValueExctractorImpl.java#L175 my self, but it didn't change the position of the postcode. Can you advise me what to change to move the postcode to the end of the address?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kiselev-dv/gazetteer/issues/54#issuecomment-300628201, or mute the thread https://github.com/notifications/unsubscribe-auth/AApLaT_HZvT2n4RiSv3ciM56nxxov6vzks5r4jacgaJpZM4NWD2v .

-- Thank you for your time. Best regards. Dmitry.