359 index tags by key - Githubissues

lemon24 commented 2 weeks ago

Approach described in https://github.com/lemon24/reader/issues/359#issuecomment-2446102455.

Timings below, summary:

slight improvement for feeds, expected because of the low number
huge improvement for entries where only a few of the entries have that tag (i.e. the index clearly is working), but worse if almost all entries have the tag – I expect the new version to be better as long as only <10% of the entries have a tag

Before:

177 total feeds
1 feeds tagged '.update'
4 feeds tagged 'corp'
8 feeds tagged 'webcomic'
64 feeds tagged 'main'

get_feeds(tags=...)
['.update']               min 0.000598  avg 0.000666
['corp']                  min 0.000659  avg 0.000692
['webcomic']              min 0.000733  avg 0.000764
['main']                  min 0.001843  avg 0.001880
[True]                    min 0.003799  avg 0.003851
[['corp', 'webcomic']]    min 0.001007  avg 0.001047
[['-webcomic']]           min 0.003975  avg 0.004035
[['corp'], ['webcomic']]  min 0.000621  avg 0.000658

get_feed_counts(tags=...)
['.update']               min 0.000438  avg 0.000454
[True]                    min 0.000385  avg 0.000431
[['corp', 'webcomic']]    min 0.000634  avg 0.000667
[['corp'], ['webcomic']]  min 0.000496  avg 0.000524

19117 total entries
2 entries tagged '.comments'
18989 entries tagged '.readtime'

get_entry_counts(tags=...)
['.comments']             min 0.145794  avg 0.147272
['.readtime']             min 0.172323  avg 0.174604

After:

177 total feeds
1 feeds tagged '.update'
4 feeds tagged 'corp'
8 feeds tagged 'webcomic'
64 feeds tagged 'main'

get_feeds(tags=...)
['.update']               min 0.000421  avg 0.000481
['corp']                  min 0.000491  avg 0.000527
['webcomic']              min 0.000570  avg 0.000599
['main']                  min 0.001692  avg 0.001735
[True]                    min 0.003749  avg 0.003809
[['corp', 'webcomic']]    min 0.000662  avg 0.000690
[['-webcomic']]           min 0.003915  avg 0.003982
[['corp'], ['webcomic']]  min 0.000638  avg 0.000668

get_feed_counts(tags=...)
['.update']               min 0.000251  avg 0.000258
[True]                    min 0.000435  avg 0.000470
[['corp', 'webcomic']]    min 0.000284  avg 0.000292
[['corp'], ['webcomic']]  min 0.000515  avg 0.000549

19117 total entries
2 entries tagged '.comments'
18989 entries tagged '.readtime'

get_entry_counts(tags=...)
['.comments']             min 0.000929  avg 0.000966
['.readtime']             min 0.415308  avg 0.421528

Timings generated with:

```python import timeit from reader import make_reader reader = make_reader('db.sqlite') url = 'https://death.andgravity.com/_feed/index.xml' tags = ".update corp webcomic main".split() def time(stmt, label='', repeat=100): times = timeit.repeat(stmt, repeat=repeat, number=1, globals=globals()) print( f"{label or stmt:24} min {min(times):.6f} avg {sum(times)/len(times):.6f}") print(reader.get_feed_counts().total, 'total feeds') for tag in tags: print(reader.get_feed_counts(tags=[tag]).total, f'feeds tagged {tag!r}') print() def time_get_feeds(tags): time(f"for _ in reader.get_feeds(tags={tags!r}): ...", f"{tags}") print("get_feeds(tags=...)") for tag in tags: time_get_feeds([tag]) time_get_feeds([True]) time_get_feeds([['corp', 'webcomic']]) time_get_feeds([['-webcomic']]) time_get_feeds([['corp'], ['webcomic']]) print() def time_get_feed_counts(tags): time(f"reader.get_feed_counts(tags={tags!r})", f"{tags}") print("get_feed_counts(tags=...)") time_get_feed_counts(['.update']) time_get_feed_counts([True]) time_get_feed_counts([['corp', 'webcomic']]) time_get_feed_counts([['corp'], ['webcomic']]) print() entry_tags = ".comments .readtime".split() print(reader.get_entry_counts().total, 'total entries') for tag in entry_tags: print(reader.get_entry_counts(tags=[tag]).total, f'entries tagged {tag!r}') print() def time_get_entry_counts(tags): time(f"reader.get_entry_counts(tags={tags!r})", f"{tags}", 10) print("get_entry_counts(tags=...)") for tag in entry_tags: time_get_entry_counts([tag]) print() reader.close() ```

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.16%. Comparing base (97a0c98) to head (7c57706). Report is 5 commits behind head on master.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #361 +/- ## ========================================== + Coverage 95.14% 95.16% +0.01% ========================================== Files 96 96 Lines 12147 12193 +46 Branches 825 837 +12 ========================================== + Hits 11557 11603 +46 Misses 516 516 Partials 74 74 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

lemon24 commented 2 weeks ago

@nobrowser, this exists now (feel free to leave feedback if you want to); I will merge it and make a release in the next days.

nobrowser commented 2 weeks ago

@nobrowser, this exists now (feel free to leave feedback if you want to); I will merge it and make a release in the next days.

Thank you, I'm taking a step back and thinking if I should use python for this at all. But looks like this change would indeed be helpful.

lemon24 / reader

359 index tags by key #361

Codecov Report