pelias / placeholder

stand-alone coarse geocoder
https://placeholder.demo.geocode.earth
MIT License
313 stars 35 forks source link

rewrite match_subject_object.sql [again] #195

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

after merging https://github.com/pelias/placeholder/pull/194 I realised that the query isn't exactly equal to what it was prior.

there is a section of the query lower down which checks that the tokens are from the same language, or from either the default language or english.

now I don't recall why we're doing it like that, but regardless the behaviour has changed, what's happening now is that we're getting a list of candidate rows from the lineage table and then joining to the tokens table.

as a result of these joins, all of the tokens for those documents are being brought in, so any of them can match the language rules, rather than just 'santa maria' and 'roma'.

this PR fixes this issue by adding the t1.token = $subject and t2.token = $object conditions.

when I added those back the performance tanked again, so bad I had to hit control+c after 30+ seconds.

IMO this is a bug in the SQLite query planner (or maybe it doesn't have enough info!?) which can be overcome by giving explicit instructions to the query planner via INDEXED BY tokens_cover_idx to tell it to use that index.

The result is this query is now logically equivalent to the original query but doesn't suffer the same performance issues. Testing showed no discernible difference in latency from what was merged yesterday.