Closed CamdenParker closed 1 week ago
Have you tried this with the Senzing provided data model?
On Tue, Nov 12, 2024 at 15:27 Camden Parker @.***> wrote:
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
USA
Here's how I'm using libpostal
I am using it in support of entity resolution amongst a myriad of environmental, health and safety data sources.
Here's what I did:
./src/address_parser Loading models...
Welcome to libpostal's address parser.
Type in any address to parse and print the result.
Special commands: .exit to quit the program
13775 CLARK RD, ROSE MOUNT, MN https://www.google.com/maps/search/13775+CLARK+RD,+ROSE+MOUNT,+MN?entry=gmail&source=g
Here's what I got:
The parser seemingly preferred to create a street name with a comma in it rather than a city with two words?
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn" }
Here's what I was expecting:
{ "house_number": "13775", "road": "clark rd", "city": "rose mount", "state": "mn" }
For parsing issues, please answer "yes" or "no" to all that apply.
- Does the input address exist in OpenStreetMap https://openstreetmap.org? No
- Do all the toponyms exist in OSM (city, state, region names, etc.)?
- If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?
- If the address does not contain city, region, etc., does adding those fields to the input improve the result? No
13775 CLARK RD, ROSE MOUNT, MN, USA https://www.google.com/maps/search/13775+CLARK+RD,+ROSE+MOUNT,+MN,+USA?entry=gmail&source=g
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "country": "usa" }
13775 CLARK RD, ROSE MOUNT, MN 55068, USA https://www.google.com/maps/search/13775+CLARK+RD,+ROSE+MOUNT,+MN+55068,+USA?entry=gmail&source=g
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "postcode": "55068", "country": "usa" } https://www.google.com/maps/search/55068%22,%0D%0A++%22country%22:+%22usa%22%0D%0A%7D%0D%0A%0D%0A+13775+CLARK+RD,+ROSE+MOUNT,+MN?entry=gmail&source=g> 13775 CLARK RD, ROSE MOUNT, MN https://www.google.com/maps/search/55068%22,%0D%0A++%22country%22:+%22usa%22%0D%0A%7D%0D%0A%0D%0A+13775+CLARK+RD,+ROSE+MOUNT,+MN?entry=gmail&source=g 55068
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "postcode": "55068" }
- If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?
13775 CLARK RD, ROSEMOUNT, MN https://www.google.com/maps/search/13775+CLARK+RD,+ROSEMOUNT,+MN?entry=gmail&source=g
Result:
{ "house_number": "13775", "road": "clark rd", "city": "rosemount", "state": "mn" }
Here's what I think could be improved:
Thinking maybe there were some edge cases in the training data where a street name came after a comma? Idrk
— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6OZVCI4TWKIINBOQQMP5L2AJQB5AVCNFSM6AAAAABRU4ULQ2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2TGMRUGEZTSMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Worked like a charm. Thank you sir
Great to hear. If you interested in any help with ER, let me know :)
On Tue, Nov 12, 2024 at 4:21 PM Camden Parker @.***> wrote:
Worked like a charm. Thank you sir
— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/675#issuecomment-2471608792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6OZVAL53DM7RJ6DQCT2AL2AJWODAVCNFSM6AAAAABRU4ULQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZRGYYDQNZZGI . You are receiving this because you commented.Message ID: @.***>
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
USA
Here's how I'm using libpostal
I am using it in support of entity resolution amongst a myriad of environmental, health and safety data sources.
Here's what I did:
Here's what I got:
The parser seemingly preferred to create a street name with a comma in it rather than a city with two words?
Here's what I was expecting:
For parsing issues, please answer "yes" or "no" to all that apply.
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "country": "usa" }
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "postcode": "55068", "country": "usa" }
Result:
{ "house_number": "13775", "road": "clark rd rose", "city": "mount", "state": "mn", "postcode": "55068" }
Result:
{ "house_number": "13775", "road": "clark rd", "city": "rosemount", "state": "mn" }