Open duckunix opened 2 years ago
I remember when I first put the application together that Reddit didn't like it unless I setup a custom user-agent. They're a bit strict about blocking access.
So there's an obvious suspicion that the feed-request is just getting blocked/filtered/broken at their side. Can you download the feed(s) with curl, successfully?
If it's broken for everything then it's clearly their fault. If you can download via curl, but not via the app then it might be something I can fix.
For what it's worth my own feed (of "private inbox" messages) continues to work so it might not necessarily be something that is globally broken.
Oddly, wget works just fine, but when I use curl, I get:
<!doctype html>
<html>
<head>
<title>Too Many Requests</title>
<style>
body {
font: small verdana, arial, helvetica, sans-serif;
width: 600px;
margin: 0 auto;
}
h1 {
height: 40px;
background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;
}
</style>
</head>
<body>
<h1>whoa there, pardner!</h1>
<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>
<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>
<p>please wait 8 second(s) and try again.</p>
<p>as a reminder to developers, we recommend that clients make no
more than <a href="http://github.com/reddit/reddit/wiki/API">one
request every two seconds</a> to avoid seeing this message.</p>
</body>
</html>
So, is there someway for me to put a sleep before/after a call to reddit?
Thanks!
BTW:
grep -c reddit.com ~/.rss2email/feeds.txt
19
Oddly, wget works just fine,
Then I'd probably suggest they're using the User-Agent
header to differentiate the two requests. You might try changing your local agent. Something like this in your feed-list:
https://reddit.com/....
- user-agent: my-safe-bot/1.0
As for sleeping between feed-requests? I'm afraid not, though it does seem like something that could be added. I could add:
http://example.com/foo
- sleep: 10
http://example.net/blah.rss
- sleep: 20
That would give a ten second sleep before fetching the first feed, and a twenty-second delay before the second.
Added that in #84 - along with a simple heuristic that adds a delay automatically if the feed being fetched is from the same hostname as the previous request. So assuming your feed contains:
reddit...
reddit...
reddit..
example.com...
example.com..
you won't need to make any config-file changes, it'll delay automatically.
So, using the version 2.5, it is still not working for me on reddit. :( This is my test feeds.txt:
https://www.reddit.com/r/swaywm/.rss
- template:reddit.tmpl
- user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
https://www.reddit.com/r/OPNsenseFirewall/.rss
- template:reddit.tmpl
- user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
Which hopefully would be working with the delay and the user-agent string, but still no joy:
time rss2email cron -verbose <email@rededicated>
Fetching feed: https://www.reddit.com/r/swaywm/.rss
Fetching from same host as previous feed, www.reddit.com, adding 5s delay
Fetching feed: https://www.reddit.com/r/OPNsenseFirewall/.rss
Skipping the prune-step because we saw errors processing our feed(s)
error processing https://www.reddit.com/r/swaywm/.rss - error parsing https://www.reddit.com/r/swaywm/.rss contents: Failed to detect feed type
error processing https://www.reddit.com/r/OPNsenseFirewall/.rss - error parsing https://www.reddit.com/r/OPNsenseFirewall/.rss contents: Failed to detect feed type
real 0m5.173s
user 0m0.038s
sys 0m0.018s
Any thought, or should I go look for something to build custom RSS feed for my reddit feeds?
Thanks, d
I'm sorry to hear that the recent delay didn't help, nor the user-agent switch.
Using some other wrapper, to fetch feeds from reddit, and present anew which you can then fetch locally should work - but I admit I'm not really too sure what options are out there, or how likely they are to get blocked in the future either. (Feedburner?)
But for this project I'm not sure there's any more useful changes I can make - I could add our version number to the default user-agent, but nothing else comes to mind.
This started in the last day or two for me. I am using the latest release version release-2.4. When trying to run, I get:
And so on for all my reddit entries.
Any thoughts?