Closed lemon24 closed 1 year ago
...like here: https://github.com/lemon24/reader/issues/310#issuecomment-1593699333 (the Twitter plugin).
The main goal here is to allow updates to continue after a feed has failed.
Things to take into consideration:
We don't really need to wrap hook (including Session hooks) and retriever/(sub)parser calls one by one, we may get away with doing it somewhere at "top level", e.g. here in Pipeline.update().
OTOH, when should unexpected exceptions (bugs) not allow updates to other feeds to continue?
{before,after}_feeds_update_hooks
, which go "around" the update loop? We don't need to wrap them, but we may want to, for consistency.Update: How granular should we be with allowing things to continue?
after_{entry,feed}_update_hooks
fails?after_entry_update_hooks
fails? Do we try to run it for the rest of the entries?Some thoughts on update hook exception handling (also see the "How granular ..." question above):
before_feed_...
and a corresponding after_feed_...
.before_feed_...
), does it make sense to continue with the feed update? If we fail early here, it simplifies the mental model quite a bit.after_entry_...
, raising an ExceptionGroup isn't all that useful (and we might end up with thousands of exceptions).Somewhat related: We continue with at least the granularity of one feed mostly because update_feeds_iter()
is a thin wrapper over for feed in get_feeds(): update_feed(feed)
(but with exception handling; we should document this).
OK, here's how update hooks get called (pseudocode; might want to document it when we're sure the order stays):
for hook in before_feeds_update_hooks:
hook(reader)
for feed in get_feeds():
try:
parsed_feed = parse(feed)
except ParseError as e:
parsed_feed = e
intents = make_intents(feed, parsed_feed)
for hook in before_feed_update_hooks:
hook(reader, feed.url)
store(intents)
for entry in intents.entries:
for hook in after_entry_update_hooks:
hook(reader, entry.entry, entry.status)
for hook in after_feed_update_hooks:
hook(reader, feed.url)
if not parsed_feed or isinstance(parsed_feed, Exception):
yield feed.url, parsed_feed
else:
yield feed.url, make_updated_feed(intents)
for hook in after_feeds_update_hooks:
hook(reader)
Following is a proposal that takes into account the notes from previous comments.
Goals:
update_feeds_iter()
.Hooks wrap unexpected exceptions as follows (new exceptions detailed below):
raise SingleUpdateHookError
yield url, SingleUpdateHookError
yield url, UpdateHookErrorGroup
yield url, UpdateHookErrorGroup
raise UpdateHookErrorGroup
We add the following new exceptions:
ReaderError (old)
+--- UpdateError
+--- UpdateHookError
| +--- SingleUpdateHookError
| +--- UpdateHookErrorGroup [ExceptionGroup]
+--- ParseError (old)
UpdateHookError will not be raised directly; it exists and has the shorter name because most people won't care about the individual errors (more than to log them), and will do except UpdateHookError
; interested people can do except* SingleUpdateHookError
to look at individual errors. SingleUpdateHookError cannot just be UpdateHookError because it will have attributes about the hook and stage that UpdateHookErrorGroup cannot not have.
Open Closed questions:
after_*_update_hooks
should not even run if there was an error getting the feed. ACTION ITEMbefore_feed_update_hooks
may run if we move them to before getting the feed, or if they become able to transform the feed update intent before it is stored (a feed update intent exists even if there was a ParseError, to set last_exception).To do:
OK, moving on to unexpected retriever/parser unexpected errors: per https://github.com/lemon24/reader/issues/218#issuecomment-1595691410, they should not prevent updates for remaining feeds; wrapping seems the way to go.
(1) A possible maximal exception hierarchy:
ReaderError (old)
+--- UpdateError (old)
+--- ParseError [FeedError, ReaderWarning] (old)
| +--- UnexpectedParseError
+--- RetrieveError [ParseError (temporary)]
+--- UnexpectedRetrieverError
This has the benefit of distinguishing clearly between the errors caused by a retriever and by those caused by a (sub-)parser, while not making the hierarchy to deep.
Some disadvantages:
reader._parser
don't have a specific common parent(2) Alternative that doesn't have them, at the expense of deeper hierarchy and slightly confusing naming (ParserError):
ReaderError (old)
+--- UpdateError (old)
+--- ParseError [FeedError, ReaderWarning] (old)
+--- RetrieverError
| +--- UnexpectedRetrieverError
+--- ParserError
+--- UnexpectedParserError
The minimal solution is to either (3) add no new classes, or (4) add a single UnexpectedParseError subclass.
An additional complication is that currently, Feed.last_exception contains details about the cause of the ParseError, if any, not the ParseError itself. If ParseError gets subclasses, it becomes imperative to also have details about the exception itself (type, message); arguably, we should have done this from the start (it would also allow capturing UpdateHookError details, if we ever decide to).
Update: I went with solution 3. https://github.com/lemon24/reader/commit/ccf8d26b74dd03fcc98774747cd6efb78cfb10bf
Session hook error already get wrapped in ParseError, and we have a test for it; good enough, for now.
From https://github.com/lemon24/reader/issues/204#issuecomment-780553373:
We should look at this when we expose the plugin API (#80).