lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
456 stars 38 forks source link

Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

Closed lemon24 closed 1 year ago

lemon24 commented 3 years ago

From https://github.com/lemon24/reader/issues/204#issuecomment-780553373:

Currently, storage or plugin-raised exceptions prevent updates for the following feeds, but that's not necessarily by design.

We should look at this when we expose the plugin API (#80).

lemon24 commented 1 year ago

...like here: https://github.com/lemon24/reader/issues/310#issuecomment-1593699333 (the Twitter plugin).

lemon24 commented 1 year ago

The main goal here is to allow updates to continue after a feed has failed.

Things to take into consideration:

lemon24 commented 1 year ago

Some thoughts on update hook exception handling (also see the "How granular ..." question above):

Somewhat related: We continue with at least the granularity of one feed mostly because update_feeds_iter() is a thin wrapper over for feed in get_feeds(): update_feed(feed) (but with exception handling; we should document this).

lemon24 commented 1 year ago

OK, here's how update hooks get called (pseudocode; might want to document it when we're sure the order stays):

for hook in before_feeds_update_hooks:
    hook(reader)

for feed in get_feeds():
    try:
        parsed_feed = parse(feed)
    except ParseError as e:
        parsed_feed = e

    intents = make_intents(feed, parsed_feed)

    for hook in before_feed_update_hooks:
        hook(reader, feed.url)

    store(intents)

    for entry in intents.entries:
        for hook in after_entry_update_hooks:
            hook(reader, entry.entry, entry.status)

    for hook in after_feed_update_hooks:
        hook(reader, feed.url)

    if not parsed_feed or isinstance(parsed_feed, Exception):
        yield feed.url, parsed_feed
    else:
        yield feed.url, make_updated_feed(intents)

for hook in after_feeds_update_hooks:
    hook(reader)

Following is a proposal that takes into account the notes from previous comments.

Goals:

Hooks wrap unexpected exceptions as follows (new exceptions detailed below):

We add the following new exceptions:

ReaderError (old)
 +--- UpdateError
       +--- UpdateHookError
       |     +--- SingleUpdateHookError
       |     +--- UpdateHookErrorGroup [ExceptionGroup]
       +--- ParseError (old)

UpdateHookError will not be raised directly; it exists and has the shorter name because most people won't care about the individual errors (more than to log them), and will do except UpdateHookError; interested people can do except* SingleUpdateHookError to look at individual errors. SingleUpdateHookError cannot just be UpdateHookError because it will have attributes about the hook and stage that UpdateHookErrorGroup cannot not have.

For Python 3.10, which doesn't have ExceptionGroup, we use a shim like this one: ```python import traceback class _ExceptionGroup(Exception): """ExceptionGroup shim for Python 3.10. Caveat: The tracebacks always show in str(exc). Avoids dependency on https://pypi.org/project/exceptiongroup/ """ def __init__(self, msg, excs): super().__init__(msg, tuple(excs)) @property def message(self): return self.args[0] @property def exceptions(self): return self.args[1] def _format_lines(self): count = len(self.exceptions) s = 's' if count != 1 else '' yield f"{self.message} ({count} sub-exception{s})\n" for i, exc in enumerate(self.exceptions, 1): yield f"+{f' {i} '.center(36, '-')}\n" for line in traceback.format_exception(exc): for line in line.rstrip().splitlines(): yield f"| {line}\n" yield f"+{'-' * 36}\n" def __str__(self): return ''.join(self._format_lines()).rstrip() try: ExceptionGroup except NameError: ExceptionGroup = _ExceptionGroup # example try: 1/0 except Exception as e: one = e raise _ExceptionGroup('stuff happened', [ NameError('name'), _ExceptionGroup('more stuff happened', [ one, AttributeError('attr'), ]), ]) """ Traceback (most recent call last): File ".../exc.py", line 47, in raise _ExceptionGroup('stuff happened', [ __main__._ExceptionGroup: stuff happened (2 sub-exceptions) +---------------- 1 ----------------- | NameError: name +---------------- 2 ----------------- | _ExceptionGroup: more stuff happened (2 sub-exceptions) | +---------------- 1 ----------------- | | Traceback (most recent call last): | | File ".../exc.py", line 43, in | | 1/0 | | ZeroDivisionError: division by zero | +---------------- 2 ----------------- | | AttributeError: attr | +------------------------------------ +------------------------------------ """ ```

Open Closed questions:

lemon24 commented 1 year ago

To do:

lemon24 commented 1 year ago

OK, moving on to unexpected retriever/parser unexpected errors: per https://github.com/lemon24/reader/issues/218#issuecomment-1595691410, they should not prevent updates for remaining feeds; wrapping seems the way to go.

(1) A possible maximal exception hierarchy:

ReaderError (old)
 +--- UpdateError (old)
       +--- ParseError [FeedError, ReaderWarning] (old)
       |     +--- UnexpectedParseError
       +--- RetrieveError [ParseError (temporary)]
             +--- UnexpectedRetrieverError

This has the benefit of distinguishing clearly between the errors caused by a retriever and by those caused by a (sub-)parser, while not making the hierarchy to deep.

Some disadvantages:

(2) Alternative that doesn't have them, at the expense of deeper hierarchy and slightly confusing naming (ParserError):

ReaderError (old)
 +--- UpdateError (old)
       +--- ParseError [FeedError, ReaderWarning] (old)
             +--- RetrieverError
             |     +--- UnexpectedRetrieverError
             +--- ParserError
                   +--- UnexpectedParserError      

The minimal solution is to either (3) add no new classes, or (4) add a single UnexpectedParseError subclass.

An additional complication is that currently, Feed.last_exception contains details about the cause of the ParseError, if any, not the ParseError itself. If ParseError gets subclasses, it becomes imperative to also have details about the exception itself (type, message); arguably, we should have done this from the start (it would also allow capturing UpdateHookError details, if we ever decide to).

Update: I went with solution 3. https://github.com/lemon24/reader/commit/ccf8d26b74dd03fcc98774747cd6efb78cfb10bf

lemon24 commented 1 year ago

Session hook error already get wrapped in ParseError, and we have a test for it; good enough, for now.