pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 76 forks source link

Change to make compound ampersand words easier to match #463

Closed blackmad closed 6 months ago

blackmad commented 4 years ago

I think this is the correct change to make it so "A&P Deli" can be matched by any of these queries: "A&P" "A & P" and "A and P"

I alias "and" to "und" because otherwise it seems like this change wouldn't work for german venues.

But then again I am still not great at schema changes. Working on building an index with this locally now. Unittests & manual testing seem to tell me this change works.

blackmad commented 4 years ago

oh right, integration tests.

orangejulius commented 4 years ago

Kicked off a build for this, will update with some results.

Personally, my biggest potential concern is that there will be lots of new short token matches, which could impact either query precision or response time. But we'll only know for sure after the build :)

orangejulius commented 4 years ago

Okay so, I reviewed our test results ran against this build the other day, and as far as I can tell it's just noise from having compared two builds run on slightly different days/data.

@blackmad do you have any good examples of pelias compare links that show cases where we aren't doing well with ampersands currently? would be great to try to find a test case in open data.

I'm going to do a little examining to see how many records this would affect (I suspect very few), and unless that investigation suggests that there might realistically be a performance impact to adding more smaller tokens to the index, this should be good to go.

blackmad commented 4 years ago

on dev, results in nyc: https://pelias.github.io/compare/#/v1/search?sources=oa%2Cosm&focus.point.lat=40.74&focus.point.lon=-74&text=h%26m

but when I switch it to "h and m" I lose all those results - this change should theoretically fix this? https://pelias.github.io/compare/#/v1/search?sources=oa%2Cosm&focus.point.lat=40.74&focus.point.lon=-74&text=h+and+m

orangejulius commented 4 years ago

Okay, I've set up some simple, exploratory acceptance tests in https://github.com/pelias/acceptance-tests/pull/534 for this.

It looks like this PR causes a mix of improvements and regressions (baseline on left, this branch on the right): Screenshot_2020-09-24_17-08-41

Improvements

Regressions

blackmad commented 4 years ago

follow-up: try adding regex pattern for ampersand at end to fix query autocomplete normalization