Closed arch-btw closed 1 month ago
Also experiencing this suddenly.
Same here, I think it’s global somehow. I suppose either they changed something on reddit’s end and/or… they’re onto us?
I can reproduce, I tried a restart of redlib which somehow fixed it for a short while, about 10 minutes, but then it came back. Maybe a slow rollout of a change on Reddit's side could cause that?
I'm seeing this as well on my own instance and on public instances. I tried restarting Redlib but it didn't help.
I've also been experiencing this on my own instance. Restarting did seem to fix it, although I'm not sure if it will last.
Edit: Already not working again 15 minutes later.
Same here. But i can call the JSON manually and get a normal response with JSON being returned. Not sure if the JSON itself has changed.
Investigating this
Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?
Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?
It's happening on both my home instance and one on a vps. Both of which I'm the only one using them.
Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?
This is happening on my local instance where I'm the only user.
Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?
Happening on my single user instance which doesn't have much load at all.
Also on my local home server, with only two people accessing it via local network.
Additional information: I run redlib (v0.35.1 compiled from source) as a systemd user service. Restarting the service makes redlib working again for a few minutes, after some minutes again above JSON error. I restarted the service several times, so far reproducible makes redlib always working again for a short time.
Same, on my own instance where I am the only user, and restarting/redeploying from the source image does fix it for a little while, though I am not sure if it lasts for a given duration or a given number of requests. Also when checking the container logs every JSON error thing is paired with a 403 forbidden from reddit.
Extract from logs of a freshly started container:
Starting Redlib... Running Redlib v0.35.1 on [::]:8080! ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1 ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1 ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1
Single user here as well. Mine is running on a VPS but i've just tried it locally and it's the same. It works for like 2 or 3 pages and then it breakes until restarted.
I've narrowed it down to a network policy error that Reddit's picking up on. This causes a 403 (which we didn't special case previously) and this network violation is detectable via the retry-after
header (and various other headers, plus plaintext response, but headers will be fastest).
Going to push out a quick fix to catch the error, refresh the token, and avoid poisoning the cache. You may need to reload the page if you see this error, but it'll ideally be fixed right after.
Would appreciate someone pulling the latest code and testing when this goes in. Docker image will be updated in a few minutes.
Leaving open until I get confirmation that you don't see this error repeatedly. It may be spurious but that's the nature of the new (?) server behavior. Future improvements would include a default retry mechanism post-token-refresh (and maybe a better-designed token pool) but for now, if the issue goes away after a reload, I'm happy
Works for me.
Logs still show an error on page load, but that doesn't affect anything in the web app (as far as I can see).
ERROR redlib::utils > Error page rendered: Nothing here
EDIT: just got rate-limited - don't know if it's related.
Couldn't send request to Reddit: Rate limit - try refreshing soon | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL
@sigaloid not entirely related, but might want to look at the failing docker build action, otherwise this fix isn't going to make it out to everyone - https://github.com/redlib-org/redlib/actions/runs/10891397433
Done, thanks!
Thank you for the quick fix 🙏
Thanks allot! If this fix works, could you drop a new release?
The latest Docker image works for me. Thank you so much for the speedy fix!
Weird that it works. since the container build is still failing: https://github.com/redlib-org/redlib/actions/runs/10891683866
Gonna be a small fix to get a new container built from main
- just some fixing of the filenames. I have class right now, if someone could look at .github/workflows/main-docker.yml:79
and see my recent commit (https://github.com/redlib-org/redlib/commit/118708400ad37c5ddfb8d54566d429c0f43b024a), and PR a fix, I could merge it. Else it'll probably be a few hours before I get to it.
My bad, sorry about that. The error did disappear for a moment, so I thought it was because of a new build, but it is now back again.
The newest commit fixed the JSON errors, but I've just begun getting the ERROR redlib::utils > Error page rendered: Couldn't send request to Reddit: Rate limit - try refreshing soon
error on both of my Redlib containers. After a restart of Redlib it only takes ~15 seconds to begin reappearing.
I have same thing to report that after latest commit we are getting new type of error which is
Rate limit error
Couldn't send request to Reddit: Rate limit - try refreshing soon | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL
NOTE - I ALSO TRIED CLEANING COMPLETE CACHE WITH MY COOKIE MANAGER ADDON STILL SAME ISSUE :-))
I pulled the latest docker image and I'm still having the same issue.
I am running on v35.1 which was released a month ago. Was the update rolled out on docker yet? As you can see no updates have been made since August: https://quay.io/repository/redlib/redlib?tab=tags
Weird that it works. since the container build is still failing: https://github.com/redlib-org/redlib/actions/runs/10891683866
Gonna be a small fix to get a new container built from
main
- just some fixing of the filenames. I have class right now, if someone could look at.github/workflows/main-docker.yml:79
and see my recent commit (1187084), and PR a fix, I could merge it. Else it'll probably be a few hours before I get to it.
I've submitted a PR #238, but to be frank, not entirely sure if it's what's required.
this issue is harder to mitigate since https://github.com/redlib-org/redlib/pull/59 was merged and we can no longer easily do a build in docker
Please pin this issue, there is a slew of duplicates coming in
~As mentioned by @ButteredCats, it seems the original fix does not really refresh the oauth token. Looks like that either refreshing the token in its own scope or launching a task to refresh the token works. See above PR.~
Never mind, that PR doesnt fix anything. It's still happening that even after refreshing the token you still get rate limited
Hi, not sure if one more voice helps troubleshooting but I've just rebuilt and it fixed this issue. Am running it as an exe file on windows. However, this leads to issue #237.
As @pimlie mentioned in https://github.com/redlib-org/redlib/pull/241#issuecomment-2355568510, restarting Redlib seems to temporarily resolve the rate limiting issue, and frequent restarts (intervals of <10 minutes) could act as a stopgap fix until a more permanent solution is found.
@perennialtech thanks, that helps for now.
The new error is intentional. It's meant to provide a better error message. But, it's supposed to be fixed temporarily after that since it rolls over the token. So constantly restarting the container really shouldn't be necessary - if it is, my fix wasn't sufficient.
If anyone wants to do more investigation and provide more info, that would be much appreciated. I see the notifications rolling in but it's very difficult to find free time to go in-depth. I always try to provide enough information when I do my own bug fixes to allow others to understand how it works.
I assume that since the currently-available docker container still works intermittently, it is a new “feature” from Reddit’s side that we will have to bypass - not a “fix” per se.
Docker container is now updated to contain the fix. Again, it really won't "fix" it, but it should reduce the number of reports since it rolls over the token immediately.
Docker container is now updated to contain the fix. Again, it really won't "fix" it, but it should reduce the number of reports since it rolls over the token immediately.
Working so far on my public instance, thank you!
edit: took about 5m to get the rate limit error but that seems like another issue.
Yeah, I'm not sure if this is a permanent fix, since https://github.com/redlib-org/redlib/issues/229#issuecomment-2354114934 mentions even with the new fix, the error still occurs repeatedly. That means the error response from Reddit seems to be more frequent than I first thought. I need to do a deep dive on this and find out if it is rooted in some separate new rate limit, or if it's completely random (in which case token roll over won't fix it, a retry mechanism might, but it also might spam it, causing even more aggressive blocking?). I need to just collect a ton of statistics about it, how frequently it occurs, if a retry mechanism would avoid it, if it's per-token, etc.
I worry that my "fix" wasn't a fix, it was just a glorified retry function.
If there’s any info I can provide to help I’d love to, however the only thing I know for certain is it happens fast, and a container restart can mitigate it shortly.
edit: checked logs and 7m went by until the first rate limit error. After restarting the container, the error occurred within 2m. Seems like you are right about it clearing cache and trying again.
I think post-restart working is just a side effect. I think that it really just clears the cache and tries again. Whereas if you don't restart, but disable caching on the main json
function and then retry, I think it'll work the second time. Next up is avoiding a poisoned cache value.
Just a datapoint: when this issue started happening yesterday, I replaced my private instance of redlib (docker arm) with the last version of libreddit (also docker arm) and that has been working just fine for the past day.
So: whatever libreddit is doing to grab its data, does not seem to trigger this new Reddit error response.
P.S. Thanks so much for libreddit/redlib!
I loaded the homepage after restarting a container. Came back to it later and it was fudged. Assuming that it's not doing background work, that'd suggest a problem with caching. What, I'm not sure. I wouldn't expect redlib to hold onto anything for an extended period of time like this.
It sounds like it's specifically a new serverside issue when requesting it from the oauth routes. If we request it normally, I'm not sure if it will give us the same ratelimit bypass as now, since it won't look like a device.
It sounds like it's specifically a new serverside issue when requesting it from the oauth routes. If we request it normally, I'm not sure if it will give us the same ratelimit bypass as now, since it won't look like a device.
Would it be at all feasible to allow server admins to pick which route they request from?
Yeah absolutely, it would likely mean it's reverting to the standard routes' rate limits though, which are worse than OAuth. A retry system would probably be a better option since it really seems to be sporadic/random.
Following @thudeous's discovery, I've done a git bisect which seems to point to 3bb5dc5f3e5221149f72b10ae8c2104a02d3a0ac as the likely first commit with this issue. In my testing, 3bb5dc5 exhibited the behavior within 5 minutes of starting, whereas none of my test runs of redlib on the parent commit 3592728 so far have shown any error message of this sort (one of which had it running for about 2 hours).
Shocking. I could not tell you why that would be. But I can reproduce it and can confirm it's fixed.
What I don't know is why. That's just a dep bump to hyper-rustls
and matching patch to fix compilation.
I see two possibilities:
It could possibly be some fingerprinting that certain Reddit edge servers have deployed, and the older version of hyper-rustls
doesn't have that issue. I suspect this is it since it just started happening without a corresponding source change.
Or perhaps it's an issue with sporadic malformed HTTPS requests being generated by that version of hyper-rustls
. It is 11 months old at this point and is outdated. But we can't upgrade it until we upgrade all of hyper, tokio, etc. Less likely but still possible and does still fit with the fact that it's a recent change - maybe certain Reddit edge servers have their HTTP parsing library updated to be more strict in what they accept, which would explain it happening all of a sudden.
Looking at hyper-rustls
patch for that version, I don't see anything that jumps out at me. Maybe it's one of the dep bumps in this diff. https://github.com/rustls/hyper-rustls/compare/v%2F0.24.1...v%2F0.25.0
Either way, you just identified the issue. Thanks so much for bisecting, I didn't think to because I immediately assumed it was a serverside change exclusively so no code change could really cause it (barring the switched endpoints).
Fixed in 793047f63f0f603e342c919bbfc469c7569276fa, new container is out on quay.io: https://github.com/redlib-org/redlib/actions/runs/10925300119
Going to fix a few other things before cutting a new release (such as failing tests). But for now, this issue is done. Thanks to everyone here who helped investigate, especially @aurtzy for finding the issue.
All instance operators should prioritize pulling the new image with at least commit 793047f.
That's my commit 😍 I caused all of this
Describe the bug
This error is currently being displayed on all instances.
Steps to reproduce the bug
Go to any instance and it will say:
Failed to parse page JSON data: expected value at line 1 column 1 | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL
What's the expected behavior?
To display any Reddit page.
Additional context / screenshot
This started happening about 10 minutes ago. The json file itself is still accessible by browser, so maybe the structure changed?