pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

Venue popularity additions #531

Closed orangejulius closed 4 years ago

orangejulius commented 4 years ago

This PR builds on #493 with a couple additions I think might help our existing set of test cases. I'm sure we could think of more to add!

Closes pelias/pelias#171 Closes https://github.com/pelias/openstreetmap/pull/493

orangejulius commented 4 years ago

I've done some testing on this PR, and while we will have a lot more work to do to really take advantage of these changes, I believe it's more than good enough to merge. It makes a big difference in a good number of cases.

Some observations:

Improved: San Francisco Zoo

/v1/autocomplete?focus.point.lat=37.743618&focus.point.lon=-122.426117&text=zoo Screenshot_2020-05-22_11-03-31

This query has given us trouble for a long time. Even with proximity boosts, there's not much to prefer a nearby Zoo over places with a technically better text match.

Improved: Statue of Liberty

This one is truly a classic, and returns much better results now. Very frustratingly, for search, the ferry terminal on Liberty Island is the first result. This is because only the ferry terminal is part of the Liberty Island neighbourhood, so it gets a boost enough to outrank the high-popularity statue. We might be able to address this by changing the boost given to neighbourhood matches, or maybe even just removing that neighbourhood from admin lookup.

/v1/search?boundary.country=USA&text=statue of liberty Screenshot_2020-05-22_11-07-43

The difference for autocomplete is quite good, however: Screenshot_2020-05-22_11-07-21

Surprise Improvement: Structured requests for venues

It looks like some long-failing structured geocoding requests are now passing because the venue being returned has just enough boosting to be ranked higher than an address for the same location:

/v1/search/structured?venue=police&address=1090 N Charlotte Street&locality=Lancaster&region=PA

Screenshot_2020-05-22_11-50-12

I'm actually a little unclear on how we want the venue parameter on structured search to behave, but it feels right that this query now returns a venue.

Less than expected (if any) improvement: POI test cases

While there are some improvements here, it looks like most POI result improvements will require scoring fixes (like https://github.com/pelias/pelias/issues/862) to solve. Screenshot_2020-05-22_12-10-38

Testing data

Here are some complete acceptance test logs used for comparison: dev-2020-05-07-10:17:40-2020-05-05-build-master-baseline-acceptance-tests.txt dev-2020-05-07-10:17:40-2020-05-05-build-master-baseline-autocomplete-acceptance-tests.txt dev-2020-05-20-14:50:19-2020-05-18-venue-popularity-additions-acceptance-tests.txt dev-2020-05-20-14:50:19-2020-05-18-venue-popularity-additions-autocomplete-acceptance-tests.txt

Conclusion

While there are places where we'd prefer better results, this PR doesn't seem to break anything, and includes quite a few solid improvements! It creates a solid foundation for adding popularity to venue records (meaning I consider it to solve a 5 year old issue: https://github.com/pelias/pelias/issues/171). I'm sure we'll be tweaking this sort of thing forever, but I think we should merge this PR to start us out!