lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
450 stars 38 forks source link

Make default_parser() part of the public API #235

Closed lemon24 closed 1 year ago

lemon24 commented 3 years ago

Make default_parser() part of the public API, because it's useful stand-alone, especially if we get a magic+ parser (#222), XML sanitization (#212), or enhanced HTML sanitization (#227).

Initially, we can expose just the callable part of the parser by wrapping the parser object in a function with the same signature:

def (
    url: str,
    http_etag: Optional[str] = None,
    http_last_modified: Optional[str] = None,
) -> Optional[ParsedFeed]: ...

Because this is a new feature, feed_root should default to None (no filesystem access).

It may be nice to wrap the cache validation headers in a typed dict (although I don't like "cache_validators" as a name):

CacheValidationHeaders = TypedDict({'ETag': str, 'Last-Modified': str}, total=False)

def (url: str, cache_validators: CacheValidationHeaders) -> Optional[ParsedFeed]: ...

A question related to this: Should we allow custom parsers to store custom caching metadata? For instance, #222 might need to store one header per page. Type annotations are not stable, we can turn the TypedDict into a regular dict later.

Based on the signature above, ParsedFeed and all its components must become public / stable as well (i.e. FeedData, EntryData, their hash property). Also, ParsedFeed should probably not be a named tuple anymore.

lemon24 commented 1 year ago

The parser internal API is now documented (although still unstable): https://reader.readthedocs.io/en/latest/internal.html#module-reader._parser

Closing.