pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

set popularity on some address records #549

Closed missinglink closed 3 years ago

missinglink commented 3 years ago

this is a fairly simple PR which sets a popularity value for address records which are 'popular' by some metric (usually they list contact details or have other metadata denoting that they are a venue).

prior to this PR only venue records would receive a popularity.

this work is motivated by this testcase which returns two valid addresses in Sydney but doesn't put the Sydney Opera House address first (inspecting the ES response shows the scores are currently identical).

Screenshot 2021-02-11 at 12 06 32

with this PR a very small scoring difference will occur which will tip the balance in the favour of the Opera House.

note: I have clamped the popularity for address records to a max of 1000, I didn't want to introduce a wide range of variability in the scoring for them, the rationale for this is discussed in bullet 4 here: https://github.com/pelias/openstreetmap/pull/493#issuecomment-503019975

missinglink commented 3 years ago

One thing to mention here is that this will likely also mean that OSM addresses (only those which are for a venue) will score higher than the equivalent OA addresses (which we don't know what they are).

I think this is correct 🤔

missinglink commented 3 years ago

I've added a second commit which changes the behaviour a bit to put it more in line with my original comment from 2019.

This massively reduces the amount of address records which receive a popularity score to only those which would score >=10000 as a venue. Applicable records are assigned a static score of 1000.

This only changes one testcase for the end-to-end tests which is a train station, I think that's the level of 'importance' we are looking for here, we don't want to boost scoring of every cafe and bookshop.

missinglink commented 3 years ago

rebased origin/master to bring in Github Actions config required for approval.

orangejulius commented 3 years ago

Oh yeah, good stuff.

1000 or zero sounds about right, and it sounds like this would have very limited impact overall. I imagine the most common situation would simply be reordering results, especially for partial queries like 30 w 26th st, without an admin area.

orangejulius commented 3 years ago

I ran this through our suite of full planet tests, and I can confirm it fixes the example query above (now in pelias/acceptance-tests#543), and doesn't introduce any other regressions. 🚢