lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
434 stars 31 forks source link

Simpler entries_by_recent index #330

Closed lemon24 closed 7 months ago

lemon24 commented 7 months ago

Based on some preliminary tests, the entries_by_recent index can be reduced from 6 fields (everything in the order by) to just ON entries(recent_sort DESC) with minimal performance impact.

To do:

lemon24 commented 7 months ago

Turns out "with minimal performance impact" is debatable – get_entries(limit=100) is as slow with an index only on recent_sort as it is if there's no index (not the case with the more complicated index).

Given this testing script:

export BENCH_TIME_STAT='min'
sync && sudo purge
python scripts/bench.py time -r10 get_entries_all 
sync && sudo purge
python scripts/bench.py time -r20 get_entries_all_page 
sync && sudo purge
python scripts/bench.py time -r20 -q andgravity search_entries_recent_all
sync && sudo purge
python scripts/bench.py time -r10 -q query search_entries_recent_all_page

Baseline:

stat number repeat get_entries_all
 min      1     10           1.660
stat number repeat get_entries_all_page
 min      1     20                0.007
stat number repeat search_entries_recent_all(andgravity)
 min      1     20                                 0.090
stat number repeat search_entries_recent_all_page(query)
 min      1     10                                 0.430

drop index entries_by_recent:

stat number repeat get_entries_all
 min      1     10          14.006
stat number repeat get_entries_all_page
 min      1     20                0.159
stat number repeat search_entries_recent_all(andgravity)
 min      1     20                                 0.226
stat number repeat search_entries_recent_all_page(query)
 min      1     10                                 0.596

create index entries_by_recent on entries(recent_sort):

stat number repeat get_entries_all
 min      1     10           2.313
stat number repeat get_entries_all_page
 min      1     20                0.153
stat number repeat search_entries_recent_all(andgravity)
 min      1     20                                 0.214
stat number repeat search_entries_recent_all_page(query)
 min      1     10                                 0.578