wtfutil / wtf

The personal information dashboard for your terminal
http://wtfutil.com
Mozilla Public License 2.0
15.89k stars 805 forks source link

Reddit module and feeds error with 403 Forbidden #1342

Closed besttof closed 1 year ago

besttof commented 2 years ago

Reddit module and feeds error with 403 Forbidden

Whenever I add a Reddit feed, either via the feedreader module or with the dedicated subreddit module, the result is always 403:

┌────── /r/news - top 3 ───────┐
│403 Forbidden                 │
│                              │
│                              │
│                              │
│                              │
│                              │
└──────────────────────────────┘

Config examples:

    subreddit:
      enabled: true
      numberOfPosts: 10
      refreshInterval: 15m
      sortOrder: top
      subreddit: "news"
      topTimePeriod: month
    feedreader:
      enabled: true
      feeds:
      - https://www.reddit.com/r/dwarffortress.rss
      feedLimit: 10
      refreshInterval: 4h

There are no issues using curl for that same rss feed or visiting reddit in the browser. However, if I try to do curl https://www.reddit.com/r/dwarffortress.rss after quitting wtfutil, it returns a Too Many Requests page. Which is weird, because the refresh intervals are very reasonable.

wtutil 0.41.0 (2021-12-08T06:06:22Z) maxOS 12.3 iTerm2 3.4.16 (but system terminal yields the same results)

foreignsasquatch commented 2 years ago

Having the same issue here on endeavour os

tsbkw commented 2 years ago

I can reproduce the issue of feedreader but could not reproduce of subreddit with Config examples.

My environment is below. (v0.42.0)

% wtfutil -v
a63329214c888cfbfc67c7ddcf31887c3c8a1c36 (2022-10-05T15:08:35Z)

config (I edited position attributes of Confing examples)

    subreddit:
      enabled: true
      numberOfPosts: 10
      refreshInterval: 15m
      sortOrder: top
      subreddit: "news"
      topTimePeriod: month
      position:
        top: 4
        left: 1
        height: 1
        width: 2
    feedreader:
      enabled: true
      feeds:
      - https://www.reddit.com/r/dwarffortress.rss
      feedLimit: 10
      refreshInterval: 4h
      position:
        top: 5
        left: 1
        height: 1
        width: 2

output

                                ┌────────────────────── /r/news - top 3 ───────────────────────┐                                                                                                        
                                │ 1. Biden to pardon all prior federal offenses of simple marij│                                                                                                        
                                │ 2. Georgia Rep. Marjorie Taylor Greene's husband files for di│                                                                                                        
                                └──────────────────────────────────────────────────────────────┘                                                                                                        
                                ┌─────────────────────── Feed Reader 4 ────────────────────────┐                                                                                                        
                                │http error: 403 Forbidden                                     │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                │                                                              │                                                                                                        
                                └──────────────────────────────────────────────────────────────┘

if I try to do curl https://www.reddit.com/r/dwarffortress.rss after quitting wtfutil, it returns a Too Many Requests page.

As far as I see, reddit prohibit access without HTTP header User-Agent. You can confirm this behavior by the following command.

% curl -v -o /dev/null  https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP' 
< HTTP/2 429

% curl -s -v -o /dev/null -H 'User-Agent: <appropriate-user-agent>'  https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP'
< HTTP/2 200

However I couldn't find why 403 occurs when I used feedreader, because it set User-Agent as Gofeed/1.0 and with curl reddit respond with 200.

% curl -s -v -o /dev/null -H 'User-Agent: Gofeed/1.0'  https://www.reddit.com/r/dwarffortress.rss 2>&1 | grep '< HTTP'
< HTTP/2 200 

Also I try to debug and print first few characters in response body when 403 occurred, and it seems to be blocked by reddit server. I think reddit server inspect request and decide to prohibit access but it heavily depends on implementation of reddit server and hard to tackle down.

"<!doctype html>\n     <html>\n  <head>\n    <title>Blocked</title>\n    <style>\n      body {\n          font: small verdana, arial, helvetica, sans-serif;\n          width: 600px;\n          margin: 0 auto;\n      }\n\n      h1 {\n          height: 40px;\n          background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;\n      }\n    </style>\n  </head>\n  <body>\n    <h1>whoa there, pardner!</h1>\n\n<p>reddit's awesome and all, but you may have a bit of a\nproblem.</p>\n\n<p>if you "
michenriksen commented 1 year ago

@besttof @foreignsasquatch @tsbkw FYI, my PR has now been merged into master which has a fix for this issue.

The issue turned out to be the fact that the default behavior of http.Client is to first try the HTTP/2 protocol before downgrading to older protocol versions. For some reason, Reddit's anti bot/automation system sees that as suspicious, so if you use the latest master, you can now set disableHTTP2: true in your feedreader settings to get around it.

@senorprogrammer I believe this issue can be closed (for now at least).

Cheers!