pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 28 forks source link

add new VenueClassification #111

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

I was thinking about https://github.com/pelias/parser/issues/105 and came to the conclusion that the PlaceClassification (ie. cafe, park, shopping mall etc.) should be distinct from (Cafe Pelias, Pelias Park or Mall of Pelias).

So in order to make that distinction I've split the two concepts up into the existing PlaceClassification (previously a public classification and now private) and VenueClassification - a public classification which combines a PlaceClassification with one or more named tokens to indicate a specific cafe, park, mall etc.

I tried to keep it as close to a no-op as possible, thankfully the existing mask symbol for place was V so I've assigned that to VenueClassification now.

I've also included the confidence classification changes mentioned in https://github.com/pelias/parser/pull/110, so this may either partially or completely replace that PR, it felt wrong to do place changes in that PR anyway ;)

There were three failing test cases but in all three cases they were covering undesirable solution(s), so I took the liberty to change them, even if the new solution is also still imperfect.

resolves https://github.com/pelias/parser/issues/105

missinglink commented 4 years ago

Funnily enough I don't think this will require any changes to pelias/api because this is pretty much what it was expecting anyway 😆

https://github.com/pelias/api/blob/master/sanitizer/_text_pelias_parser.js#L123-L125

missinglink commented 4 years ago

Yeah the 'café new york' case is not great, although it wasn't really great before either 🤷 At some point we might want to do something like { category: 'café', locality: 'new york' }

I'm happy to merge this as-is, more generally I'm finding the place classifications hard to get right, we do an OK job of them but they're really better suited to be solved a proper geocoder which holds all the place names in an index.

missinglink commented 4 years ago

I'm going to merge this as a non-breaking version bump, the plan is that it doesn't have the ability to break existing integrations.