lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
434 stars 31 forks source link

Different feed update frequencies #332

Closed lemon24 closed 1 month ago

lemon24 commented 4 months ago

307 needs a way of changing the feed update frequency. Just like https://github.com/lemon24/reader/issues/96#issuecomment-1236304134, this should be a configurable global/per-feed strategy.

Use cases:

A unified way of updating feeds would be nice as well, e.g. "just call this every minute". Currently, I'm running the following in a cron:

lemon24 commented 3 months ago

Logic is easy enough (turning update_after etc. to/from seconds not shown):

import random
from dataclasses import dataclass

@dataclass
class Feed:
    url: str
    update_period: int
    update_after: int = 0
    # last_updated is only set when the feed is actually updated
    # (not when it's not modified, not when there was an exception)
    # https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L276
    # https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L445
    last_retrieved: int = 0

def get_feeds_for_update(feeds, now):
    return [f for f in feeds if f.update_after <= now]

def next_period(feed, now, jitter_ratio=0):
    jitter = random.random() * jitter_ratio
    current_period_no = now // feed.update_period
    return (current_period_no + 1 + jitter) * feed.update_period

def update_feeds(feeds, now, get_update_after=next_period):
    to_update = get_feeds_for_update(feeds, now)
    for feed in to_update:
        feed.last_retrieved = now
        feed.update_after = get_update_after(feed, now)
    return to_update

def set_update_period(feed, update_period):
    feed.update_period = update_period
    feed.update_after = next_period(feed, feed.last_retrieved)
Tests: ```python from collections import Counter from functools import partial import pytest @pytest.mark.parametrize('old_after, new_after, now', [ (0, 10, 0), (0, 10, 1), (0, 10, 9.999), (0, 20, 10), (5, 10, 5), (10, 20, 10), (105, 110, 109), (105, 120, 110), (105, 200, 199.999), (105, 210, 200), ]) def test_update(old_after, new_after, now): feeds = [Feed('one', 10, old_after)] assert len(update_feeds(feeds, now)) == 1 assert feeds == [Feed('one', 10, new_after, now)] @pytest.mark.parametrize('old_after, now', [ (5, 4), (10, 9.999), (20, 19), ]) def test_no_update(old_after, now): feeds = [Feed('one', 10, old_after)] assert len(update_feeds(feeds, now)) == 0 assert feeds == [Feed('one', 10, old_after)] @pytest.mark.parametrize('get_update_after', [ next_period, # jitter ratio less than 10-1, to account for time step partial(next_period, jitter_ratio=.9), ]) def test_sweep(get_update_after): feeds = [Feed('one', 10), Feed('two', 20), Feed('three', 100)] counts = Counter() for now in range(100): for feed in update_feeds(feeds, now, get_update_after): counts[feed.url] += 1 assert counts == {'one': 10, 'two': 5, 'three': 1} def test_set_period_up(): feeds = [Feed('one', 10)] update_feeds(feeds, 5) set_update_period(feeds[0], 20) # no update needed, already updated in this period assert len(update_feeds(feeds, 15)) == 0 def test_set_period_down(): feeds = [Feed('one', 20)] update_feeds(feeds, 5) set_update_period(feeds[0], 10) # update needed, if taking new period into account assert len(update_feeds(feeds, 15)) == 1 ```

On to the API!

Update: We can get rid of last_retrieve and rely on the current time in set_update_period(); all tests pass with minimal changes.

lemon24 commented 3 months ago

API

Add update_feeds(scheduled: bool | None = None) argument that filters feeds to update:

In reader 4.0 (#291), we can make scheduled default to True (or just change the default behavior).


To configure the update interval, we can use a .reader.update tag:

Using tags is good because it allows configuring stuff without additional UI.

Possible cons (WIP):


In the (low level) storage API:

lemon24 commented 2 months ago

To do (minimal):

Later:

lemon24 commented 2 months ago

update_after and last_retrieved should go on FeedUpdateIntent, and in turn FeedUpdateIntent should have union of FeedData-with-extra-stuff or exception or None, but I'm having trouble finding a good name for FeedData-with-extra-stuff.

For reference, here's all the feed-related data classes and how they're used:

        .-- FeedForUpdate ---.
        v                    |
     parser                  |
        |                    | 
    ParsedFeed            storage -. 
  (has FeedData)             ^     |
        v                    |     |
     updater                 |     |
        |                    |     |
        |- FeedUpdateIntent -'    Feed
        | (has ??? (has FeedData)  |
        |   or ExceptionInfo       |
        |   or None)               |
        |                          v
        '---- UpdateResult -----> user
            (has UpdatedFeed)