Closed besttof closed 1 year ago
Having the same issue here on endeavour os
I can reproduce the issue of feedreader but could not reproduce of subreddit with Config examples.
My environment is below. (v0.42.0)
% wtfutil -v
a63329214c888cfbfc67c7ddcf31887c3c8a1c36 (2022-10-05T15:08:35Z)
config (I edited position attributes of Confing examples)
subreddit:
enabled: true
numberOfPosts: 10
refreshInterval: 15m
sortOrder: top
subreddit: "news"
topTimePeriod: month
position:
top: 4
left: 1
height: 1
width: 2
feedreader:
enabled: true
feeds:
- https://www.reddit.com/r/dwarffortress.rss
feedLimit: 10
refreshInterval: 4h
position:
top: 5
left: 1
height: 1
width: 2
output
┌────────────────────── /r/news - top 3 ───────────────────────┐
│ 1. Biden to pardon all prior federal offenses of simple marij│
│ 2. Georgia Rep. Marjorie Taylor Greene's husband files for di│
└──────────────────────────────────────────────────────────────┘
┌─────────────────────── Feed Reader 4 ────────────────────────┐
│http error: 403 Forbidden │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
└──────────────────────────────────────────────────────────────┘
if I try to do curl https://www.reddit.com/r/dwarffortress.rss after quitting wtfutil, it returns a Too Many Requests page.
As far as I see, reddit prohibit access without HTTP header User-Agent. You can confirm this behavior by the following command.
% curl -v -o /dev/null https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP'
< HTTP/2 429
% curl -s -v -o /dev/null -H 'User-Agent: <appropriate-user-agent>' https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP'
< HTTP/2 200
However I couldn't find why 403 occurs when I used feedreader, because it set User-Agent as Gofeed/1.0 and with curl reddit respond with 200.
% curl -s -v -o /dev/null -H 'User-Agent: Gofeed/1.0' https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP'
< HTTP/2 200
Also I try to debug and print first few characters in response body when 403 occurred, and it seems to be blocked by reddit server. I think reddit server inspect request and decide to prohibit access but it heavily depends on implementation of reddit server and hard to tackle down.
"<!doctype html>\n <html>\n <head>\n <title>Blocked</title>\n <style>\n body {\n font: small verdana, arial, helvetica, sans-serif;\n width: 600px;\n margin: 0 auto;\n }\n\n h1 {\n height: 40px;\n background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;\n }\n </style>\n </head>\n <body>\n <h1>whoa there, pardner!</h1>\n\n<p>reddit's awesome and all, but you may have a bit of a\nproblem.</p>\n\n<p>if you "
@besttof @foreignsasquatch @tsbkw FYI, my PR has now been merged into master
which has a fix for this issue.
The issue turned out to be the fact that the default behavior of http.Client
is to first try the HTTP/2 protocol before downgrading to older protocol versions. For some reason, Reddit's anti bot/automation system sees that as suspicious, so if you use the latest master, you can now set disableHTTP2: true
in your feedreader settings to get around it.
@senorprogrammer I believe this issue can be closed (for now at least).
Cheers!
Reddit module and feeds error with 403 Forbidden
Whenever I add a Reddit feed, either via the
feedreader
module or with the dedicatedsubreddit
module, the result is always 403:Config examples:
There are no issues using curl for that same rss feed or visiting reddit in the browser. However, if I try to do
curl https://www.reddit.com/r/dwarffortress.rss
after quitting wtfutil, it returns aToo Many Requests
page. Which is weird, because the refresh intervals are very reasonable.wtutil 0.41.0 (2021-12-08T06:06:22Z) maxOS 12.3 iTerm2 3.4.16 (but system terminal yields the same results)