openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.02k stars 417 forks source link

Address Search #492

Open scriptsure opened 4 years ago

scriptsure commented 4 years ago

I just had a question on a used case below:


My country is

United States

Here's how I'm using libpostal

Sorry to ask this question here but i suspect it will take someone all of 2 seconds to give me an answer.

I have a list of 70,000 addresses that i need to be able to search through that are stored in a MySql database. We are ONLY searching the United States. I want to use naturaljs/nodejs/libpostal to perform searches probably using string distance to determine if the address is a match to show user. Some of the use cases are where they spelled it wrong or where they have abbreviated (i.e. Ave vs Avenue, St. vs Saint, Altanta vs Atlanta). We are using straight SQL to search these now and it is widely inadequate. A pharmacy could be spelled differently than the user thought, or maybe hyphenated, or maybe 2 words instead of one. So users are getting NO results and causing many usability issues and support calls.

Additionally, after it is found i need to return the identification code for the address, so that it may be used on UI... Each of the addresses have a special identification that is not known to this community and are used in the healthcare space.

How do i load all 70,000 into memory like that so that natural can search all 70,000?

  1. Would I first normalize the data into full string addresses. So instead of individual fields in a database I would have a full address string in a single field in database.
  2. On boot of nodejs I could load all 70,000 normalized addresses into Redis
  3. As user searches in a google style lookup I would split the address using Libpostal
  4. This is where I get a little confused on what to do... Since the data would possibly be in Redis and not indexed in anyway... each search for autocomplete would have to search all strings. Is this a bad approach? Probably would only show 25 hits then stop... I guess my big issue is how to apply NLP after Libpostal does its work in a best practice manner.

Since this is healthcare we are searching for addresses of pharmacies. So I want the user to type in cvs New York NY and it would split correctly using Libpostal then perform NLP on our predefined list of pharmacies produced by the industry. We do not control that list, it is a list from an organization that allows pharmacies to register themselves.

Thanks in advance for any assistance. Love Libpostal really hoping I can put this all together.

scriptsure commented 4 years ago

@albarrentine do you have any thoughts on this?