lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
455 stars 38 forks source link

Certain feeds not working #226

Closed rohanbansal12 closed 3 years ago

rohanbansal12 commented 3 years ago

When trying to add a series of feeds to the reader, some work fine but the reader seems unable to get entries and information from others (which otherwise look normal).

For example, http://a16z.com/feed/ can not seem to be picked up by the reader. The following code displays an empty list.

from reader import make_reader, FeedExistsError

reader = make_reader("db.sqlite")

reader.add_feed("http://a16z.com/feed/")

reader.update_feeds()

list(reader.get_entries())

Unsure of the reason behind this (and it is also happening for other "normal" feeds). Any ideas?

lemon24 commented 3 years ago

I am able to reproduce this as well.

You can see more details about what caused an update to fail by updating feeds individually with update_feed() (which will raise the exception), or by checking feed.last_exception; relevant bit from the documentation.

In your case:

>>> feed = reader.get_feed("http://a16z.com/feed/")
>>> feed.last_exception is not None
True
>>> print(feed.last_exception.traceback_str)
Traceback (most recent call last):
  ...
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://a16z.com/feed/

I've seen this happen when overzealous CDNs block reader requests based on their user agent.

The ua_fallback plugin can be used to fix it (sometimes); the docs contain info on how to use it with the CLI; to use it from within Python (no need to install anything):

>>> from reader._plugins import ua_fallback
>>> ua_fallback.init(reader)
>>> reader.update_feeds()
>>> next(reader.get_entries()).title
'16 Minutes #59: The U.S. Vaccine Rollout'

Despite plugins being "not stable yet", you can consider this specific plugin and usage (ua_fallback.init(reader)) stable; I'll make sure it continues to work at least until reader 2.0 is released. see https://github.com/lemon24/reader/issues/226#issuecomment-808251035

lemon24 commented 3 years ago

As part of #229, ua_fallback is now a built-in plugin and is enabled by default: https://reader.readthedocs.io/en/latest/guide.html#plugins

This feature will go out in 1.16; to use it before then, install reader directly from the repo: https://reader.readthedocs.io/en/latest/install.html#living-on-the-edge

Since I got no reply, I removed the ua_fallback.init(reader) use shown above (it will continue to work in 1.15, starting with 1.16 you don't need to do anything in your code).