pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

remove parenthesed portion of names #559

Open missinglink opened 3 years ago

missinglink commented 3 years ago

this DRAFT PR is to explore the idea of removing parenthesed portions of names. I'm not 100% sure this is a great idea, the test cases illustrate some positive and some potentially negative results.

missinglink commented 3 years ago

This was motivated by the following results from a TV series showing up for the query 90210:

Screenshot 2021-08-11 at 15 03 31
missinglink commented 3 years ago

I also considered implementing something similar in pelias/schema where we would store the original text verbatim but only index the tokens outside the parenthesis. It's also not without its potential issues...

orangejulius commented 3 years ago

I like this! I'm sure it has a downside somewhere, but I think it's worth exploring. Definitely worth kicking off a build. The diff in the Vancouver extract actually looks very positive.

orangejulius commented 2 years ago

I came across this PR again today and figured we should test it out. Branch is rebased and a planet build is kicked off :)