pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 27 forks source link

venue_classification: investigate change #151

Open missinglink opened 2 years ago

missinglink commented 2 years ago

Today we are merging https://github.com/pelias/api/pull/1565 which brings a bunch of pelias/parser changes into pelias/api.

As part of this process we did some wider acceptance test checks and diff'd them against the current baseline.

One change which was identified was this query (at partial completion "San Simeon Drive Desert Hot Spr") which identifies the incomplete spr token as a street.

San Simeon Drive Desert Hot Springs CA 92240 {"focus.point.lat":33.96112,"focus.point.lon":-116.50168}
-FFFFF0000000000000000000000000000000000FFF00
+FFFFF0000000000000000000000000F00000000FFF00

Running a git bisect shows that this change was introduced in https://github.com/pelias/parser/commit/a65218d347b682da291ee16b6f84ddf4aba4827a

A simple change to the en/street_types.txt file seems to resolve the issue, but it's unclear why this issue didn't exist previously.

diff --git a/resources/pelias/dictionaries/libpostal/en/street_types.txt b/resources/pelias/dictionaries/libpostal/en/street_types.txt
index 30ecf9d..9fbdcbe 100644
--- a/resources/pelias/dictionaries/libpostal/en/street_types.txt
+++ b/resources/pelias/dictionaries/libpostal/en/street_types.txt
@@ -14,3 +14,5 @@ beltway
 !broadway|bdwy|bway|bwy|brdway
 !esplanade|esp|espl
 market
+
+!spr