Closed steveha-ziprecruiter closed 7 years ago
In the parser-data branch of libpostal, there's been a fair amount of work done on creating simple place name queries, breaking up imperfect city name fields (e.g. when someone enters addr:city="Columbus OH", breaking off the "OH" from the city name), and ensuring that each place can have only one label that is used consistently. Those improvements will be in the next release.
There is an early version of the new model available at: https://libpostal.s3.amazonaws.com/mapzen_sample/parser_full.tar.gz, if you want to try it out. To use that (doesn't require switching branches or anything, it's the same model in master trained on new data), just unpack the contents of the tarball into $DATA_DIR/libpostal/address_parser where $DATA_DIR is whatever you passed in during configure, default is /usr/local/share I believe.
Hey @steveha-ziprecruiter, the libpostal 1.0 release is trained with all the city names in OSM and their parent admins (I've also made the training data public, feel free to download/grep through the place names training set if something's not working as you expect). The new parser should perform quite well on simple city queries. There might be one or two minor issues with certain multiword place names getting broken up. Will try to work that out as well for the next training batch.
Thank you! We have this new version in production now! ^_^
That's awesome! So for the place search box I presume?
Yes. When a job seeker wants to find a job, libpostal is part of the pipeline that converts what the job seeker typed into a location. So anyone who looks for a job on our web site is being helped by libpostal.
When my company parses location strings, the strings are usually city names. "San Francisco" might be a common example, or "Columbus, Ohio".
I've experimented with
libpostal
and found I get better results by prepending1234 Main Street
in front of location strings. It doesn't seem to cause any harm in the rare cases where we get a location string that does include a street address (the parse just returns multiple address parts).I request that the training data include city names like "Columbus, Ohio" or "Paris, France".