Closed lemon24 closed 2 weeks ago
Repeating for entries:
tweeted
tag; there's 1 tag per entry on averageCurrent query:
WITH
__entry_tags AS (
SELECT key FROM entry_tags
WHERE (id, feed) = (entries.id, entries.feed)
)
SELECT entries.feed, entries.id, feeds.title
FROM entries
JOIN feeds ON feeds.url = entries.feed
WHERE 'tweeted' IN __entry_tags;
QUERY PLAN
|--SCAN TABLE feeds
|--SEARCH TABLE entries USING INDEX entries_by_feed (feed=?)
`--CORRELATED LIST SUBQUERY 2
`--SEARCH TABLE entry_tags USING COVERING INDEX sqlite_autoindex_entry_tags_1 (id=? AND feed=?)
Run Time: real 0.087 user 0.062563 sys 0.024012
After create index entry_tags_by_key on entry_tags(key)
:
WITH
__tag_entries AS (
SELECT id, feed
FROM entry_tags
WHERE key = 'tweeted'
)
SELECT entries.feed, entries.id, feeds.title
FROM entries
JOIN feeds ON feeds.url = entries.feed
WHERE (id, feed) in __tag_entries;
QUERY PLAN
|--SEARCH TABLE entries USING COVERING INDEX sqlite_autoindex_entries_1 (id=?)
|--LIST SUBQUERY 2
| `--SEARCH TABLE entry_tags USING INDEX entry_tags_by_key (key=?)
|--LIST SUBQUERY 2
| `--SEARCH TABLE entry_tags USING INDEX entry_tags_by_key (key=?)
`--SEARCH TABLE feeds USING INDEX sqlite_autoindex_feeds_1 (url=?)
Run Time: real 0.000 user 0.000412 sys 0.000169
...and:
SELECT DISTINCT entries.feed, entries.id, feeds.title
FROM entries
JOIN feeds ON feeds.url = entries.feed
JOIN entry_tags ON (entries.feed, entries.id) = (entry_tags.feed, entry_tags.id)
WHERE entry_tags.key = 'tweeted';
QUERY PLAN
|--SEARCH TABLE entry_tags USING INDEX entry_tags_by_key (key=?)
|--SEARCH TABLE entries USING COVERING INDEX sqlite_autoindex_entries_1 (id=? AND feed=?)
|--SEARCH TABLE feeds USING INDEX sqlite_autoindex_feeds_1 (url=?)
`--USE TEMP B-TREE FOR DISTINCT
Run Time: real 0.000 user 0.000352 sys 0.000158
So, bit of a complication here.
As I see it now, there are two main ways of writing this query:
WITH __feed_tags
query)
WHERE tag in CTE
WITH __tag_feeds
, JOIN feed_tags
queries)
one OR two
(tags=[['one', 'two']]
); for one AND two
(tags=[['one'], ['two']]
), you have to intersect (join) two CTEsAdditionally, I don't think there's a way to write a query that allows the query planner to decide between the two (especially not the SQLite one).
(I vaguely remember going through this thought process when implementing tags initially, I now wish I'd had bothered to write the conclusions down.)
So, I think it's useful to have #2 as an optimization. Also, we can make a recommendation to storage implementation to optimize at least the "has single tag X" filters.
Follow-up to https://github.com/lemon24/reader/issues/358.
Currently, the query for
get_feeds(tags=['one'])
results in a scan:With
create index feed_tags_by_key on feed_tags(key)
, there are at least two ways to rewrite that query to only use searches:...and:
This is not an issue for global tags, since the tag key is the primary key.