Closed sellnat77 closed 4 years ago
This ticket is heavily ivestigative, i put in my hunch for the ticket description but it very well could be something else, feel free to explore all avenues
So I did some investigation of alternatives to the IN query. Bullet points:
So starting with the basics, I did an IN query for a one-year period using the fully populated database (2015 through 2020). Starting with 1 NC, and then working my way up to 100. I ran each query seven times (in pure SQL, no ORM), and took the average of the 7 runs. These are the results:
As you can see, the query was reasonably fast up to 20 NCs, but became super-slow after that.
IN queries can be rewritten to use join-on-values syntax. For example, a basic IN:
SELECT requesttype, srnumber, latitude, longitude
FROM ingest_staging_table
WHERE nc IN (4, 5, 6)
can be rewritten like this:
SELECT requesttype, srnumber, latitude, longitude
FROM ingest_staging_table
INNER JOIN (
VALUES (4),(5),(6)) vals(v)
ON (nc = v)
I ran the same test as above except using JOIN queries. Unfortunately, this was also pretty slow when run against the full database.
So the JOIN alternative started getting slow with only 10 ncs. Comparing the two, you can see that the IN query is faster until you get to 30NCs. After that, JOIN is faster, but not by a lot.
The stackoverflow post above mentioned an anti-join query. Those queries are basically NOT IN queries expressed as JOINs. So an anti-join looks like this:
SELECT requesttype, srnumber, latitude, longitude
FROM ingest_staging_table
LEFT OUTER JOIN
(VALUES (4),(5),(6)) vals(v)
ON (nc = v)
WHERE v IS NULL
This query would return the rows where the nc
column is NOT in [4, 5, 6].
This type of query could be useful in theory. For example if the user submitted 95 NCs, we could run an anti-join using the remaining 5 NCs. Unfortunately, the anti-join ran slower than the IN and the JOIN:
All of these tests were run on my MacBook using pure SQL queries. The code is here.
Description
Using
in
in where clauses will cause latency issues with large tables, review this stackoverflow for legitimacy and implement or investigate other means of improvementAction Items
Resources
https://stackoverflow.com/a/17824797