lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
456 stars 38 forks source link

Read time plugin #275

Closed lemon24 closed 2 years ago

lemon24 commented 2 years ago

Follow-up to #269.

Currently, the entry read time is computed every time /entries is loaded. It would be really nice if this was done only once, when the entry is updated; now that we have entry tags, we actually have where to put this metadata. A plugin is ideal for this.

Thoughts:

lemon24 commented 2 years ago
def after_entry_update_hook(entry):
    set_tag(entry, '.readtime', readtime(entry))

# inefficient;
# doesn't do anything for feeds with updates disabled
def after_feed_update_hook(feed):
    for entry in get_entries(feed=feed):
        if has_tag(entry, '.readtime'):
            continue
        set_tag(entry, '.readtime', readtime(entry))

# very efficient, but permanently sets a feed tag;
# doesn't do anything for feeds with updates disabled
def after_feed_update_hook(feed):
    if has_tag(feed, '.readtime.backfill-done'):
        return
    for entry in get_entries(feed=feed):
        if has_tag(entry, '.readtime'):
            continue
        set_tag(entry, '.readtime', readtime(entry))
    set_tag(feed, '.readtime.backfill-done')

# more efficient, needs filter-by-entry-tags feature;
# doesn't do anything for feeds with updates disabled
def after_feed_update_hook(feed):
    for entry in get_entries(feed=feed, tags=['-.readtime']):
        set_tag(entry, '.readtime', readtime(entry))

# efficient, permanently sets just a global tag;
# convoluted; first feed does a lot of extra work, once
def after_feed_update_hook(feed):
    if not has_tag((), '.readtime.backfill-done'):
        for f in get_feeds():
            if f.updates_enabled:
                set_tag(f, '.readtime.backfill-needed')
            else:
                # feeds with disabled updates might never be 
                for entry in get_entries(feed=f):
                    if has_tag(entry, '.readtime'):
                        continue
                    set_tag(entry, '.readtime', readtime(entry))
        set_tag((), '.readtime.backfill-done')
    if not has_tag(feed, '.readtime.backfill-needed'):
        return
    for entry in get_entries(feed=feed):
        if has_tag(entry, '.readtime'):
            continue
        set_tag(entry, '.readtime', readtime(entry))
    delete_tag(feed, '.readtime.backfill-needed')
lemon24 commented 2 years ago

Here's a way of backfilling that only requires an extra global tag, and hooks before and after update_feeds():

def before_feeds_update_hook():
    if has_tag((), '.readtime.backfill-done'):
        return
    for feed in get_feeds():
        set_tag(feed, '.readtime.backfill-needed')

def after_feed_update_hook(feed):
    if not has_tag(feed, '.readtime.backfill-needed'):
        return
    for entry in get_entries(feed=feed):
        if has_tag(entry, '.readtime'):
            continue
        set_tag(entry, '.readtime', readtime(entry))
    delete_tag(feed, '.readtime.backfill-needed')

def after_feeds_update_hook():
    if has_tag((), '.readtime.backfill-done'):
        return
    for feed in get_feeds(updates_enabled=False):
        for entry in get_entries(feed=feed):
            if has_tag(entry, '.readtime'):
                continue
            set_tag(entry, '.readtime', readtime(entry))
    set_tag((), '.readtime.backfill-done')
lemon24 commented 2 years ago

To do:

lemon24 commented 2 years ago

Some stats for the web app after using the readtime tag (60b0459):

before after
/entry 0.049 0.009
/?limit=64 0.25 0.19
/?limit=512 1.5 1.0
/?limit=2048 5.7 4.0

Hopefully, this wins most of what was lost when readtime was first added back in 2.6. We could likely get more by making get_entries() also return a set of tags for each entry, instead of doing additional get_tag() calls.

As a bonus, the read time is now also shown for search.