there is a section of the query lower down which checks that the tokens are from the same language, or from either the default language or english.
now I don't recall why we're doing it like that, but regardless the behaviour has changed, what's happening now is that we're getting a list of candidate rows from the lineage table and then joining to the tokens table.
as a result of these joins, all of the tokens for those documents are being brought in, so any of them can match the language rules, rather than just 'santa maria' and 'roma'.
this PR fixes this issue by adding the t1.token = $subject and t2.token = $object conditions.
when I added those back the performance tanked again, so bad I had to hit control+c after 30+ seconds.
IMO this is a bug in the SQLite query planner (or maybe it doesn't have enough info!?) which can be overcome by giving explicit instructions to the query planner via INDEXED BY tokens_cover_idx to tell it to use that index.
The result is this query is now logically equivalent to the original query but doesn't suffer the same performance issues.
Testing showed no discernible difference in latency from what was merged yesterday.
after merging https://github.com/pelias/placeholder/pull/194 I realised that the query isn't exactly equal to what it was prior.
there is a section of the query lower down which checks that the tokens are from the same language, or from either the default language or english.
now I don't recall why we're doing it like that, but regardless the behaviour has changed, what's happening now is that we're getting a list of candidate rows from the
lineage
table and then joining to thetokens
table.as a result of these joins, all of the tokens for those documents are being brought in, so any of them can match the language rules, rather than just 'santa maria' and 'roma'.
this PR fixes this issue by adding the
t1.token = $subject
andt2.token = $object
conditions.when I added those back the performance tanked again, so bad I had to hit control+c after 30+ seconds.
IMO this is a bug in the SQLite query planner (or maybe it doesn't have enough info!?) which can be overcome by giving explicit instructions to the query planner via
INDEXED BY tokens_cover_idx
to tell it to use that index.The result is this query is now logically equivalent to the original query but doesn't suffer the same performance issues. Testing showed no discernible difference in latency from what was merged yesterday.