pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 28 forks source link

venue work: changes to how ampersands are parsed #120

Open missinglink opened 4 years ago

missinglink commented 4 years ago

This PR changes to how ampersands are handled with a preference for venues over intersections in some cases.

The work is motivated by the parser having poor support for things like "Bar & Grill" (and venue names containing ampersands in general)

the diff looks like more changes than it really is πŸ˜„

the main difference is in parser/AddressParser.js

this allows us to change the behaviour of "IntersectionClassifier" to allow this exception:

making this change means that it's possible to see odd classifications such as '& grill' as a street, to resolve this I've added not: 'PunctuationClassification' in many places to differentiate from an AlphaClassification.

still a DRAFT PR for now, needs more testing before opening up for merging.

missinglink commented 4 years ago

one of the issues with this method is that there will be 'jitter' for partially complete inputs, eg:

'foo & bar'

(0.80) ➜ [ { venue: 'foo & bar' } ]
(0.70) ➜ [ { street: 'foo' }, { street: 'bar' } ]
'foo & ba'

(0.68) ➜ [ { street: 'foo' }, { street: 'ba' } ]

... although this might not be an issue in pelias/api, depending on how it's converted to an ES query in https://github.com/pelias/api/pull/1487

[edit] https://github.com/pelias/api/pull/1487/commits/50c15db0d971cb46813711295e5973929950a940 shows it's not an issue πŸŽ‰