rinigus / geocoder-nlp

Geocoder library based on libpostal normalization of libosmscout generated database
MIT License
21 stars 1 forks source link

clean database from common substrings #17

Closed rinigus closed 7 years ago

rinigus commented 7 years ago

as a part of a solution allowing to search for substrings of the locations (for example, if the street is named "John Smith", you could search for Smith), some common substrings are probably in the database. For example, "street". Make stats and remove them automatically on the basis of database record distributions.

rinigus commented 7 years ago

Looks like in Estonia impact is rather small (below 1% of all records in the table?).