🐛 Bug Report: Failed to parse page JSON data: expected value at line 1 column 1 [RATE LIMITS]

arch-btw commented 1 month ago

Describe the bug

This error is currently being displayed on all instances.

Steps to reproduce the bug

Go to any instance and it will say:

Failed to parse page JSON data: expected value at line 1 column 1 | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL

What's the expected behavior?

To display any Reddit page.

Additional context / screenshot

This started happening about 10 minutes ago. The json file itself is still accessible by browser, so maybe the structure changed?

khan5ula commented 1 month ago

Also experiencing this suddenly.

Mr3ase commented 1 month ago

Same here, I think it’s global somehow. I suppose either they changed something on reddit’s end and/or… they’re onto us?

babeuh commented 1 month ago

I can reproduce, I tried a restart of redlib which somehow fixed it for a short while, about 10 minutes, but then it came back. Maybe a slow rollout of a change on Reddit's side could cause that?

netrunnereve commented 1 month ago

I'm seeing this as well on my own instance and on public instances. I tried restarting Redlib but it didn't help.

cyhyraethz commented 1 month ago

I've also been experiencing this on my own instance. Restarting did seem to fix it, although I'm not sure if it will last.

Edit: Already not working again 15 minutes later.

r7l commented 1 month ago

Same here. But i can call the JSON manually and get a normal response with JSON being returned. Not sure if the JSON itself has changed.

sigaloid commented 1 month ago

Investigating this

sigaloid commented 1 month ago

Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?

Taubin commented 1 month ago

Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?

It's happening on both my home instance and one on a vps. Both of which I'm the only one using them.

netrunnereve commented 1 month ago

Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?

This is happening on my local instance where I'm the only user.

babeuh commented 1 month ago

Is anyone reproducing this locally (not on a public instance)? Or does this require heavy load to trigger?

Happening on my single user instance which doesn't have much load at all.

SteffenBauer commented 1 month ago

Also on my local home server, with only two people accessing it via local network.

Additional information: I run redlib (v0.35.1 compiled from source) as a systemd user service. Restarting the service makes redlib working again for a few minutes, after some minutes again above JSON error. I restarted the service several times, so far reproducible makes redlib always working again for a short time.

Mr3ase commented 1 month ago

Same, on my own instance where I am the only user, and restarting/redeploying from the source image does fix it for a little while, though I am not sure if it lasts for a given duration or a given number of requests. Also when checking the container logs every JSON error thing is paired with a 403 forbidden from reddit.

Extract from logs of a freshly started container:

Starting Redlib... Running Redlib v0.35.1 on [::]:8080! ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1 ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1 ERROR redlib::client > Got an invalid response from reddit expected value at line 1 column 1. Status code: 403 Forbidden ERROR redlib::utils > Error page rendered: Failed to parse page JSON data: expected value at line 1 column 1

r7l commented 1 month ago

Single user here as well. Mine is running on a VPS but i've just tried it locally and it's the same. It works for like 2 or 3 pages and then it breakes until restarted.

sigaloid commented 1 month ago

I've narrowed it down to a network policy error that Reddit's picking up on. This causes a 403 (which we didn't special case previously) and this network violation is detectable via the retry-after header (and various other headers, plus plaintext response, but headers will be fastest).

Going to push out a quick fix to catch the error, refresh the token, and avoid poisoning the cache. You may need to reload the page if you see this error, but it'll ideally be fixed right after.

Would appreciate someone pulling the latest code and testing when this goes in. Docker image will be updated in a few minutes.

sigaloid commented 1 month ago

Leaving open until I get confirmation that you don't see this error repeatedly. It may be spurious but that's the nature of the new (?) server behavior. Future improvements would include a default retry mechanism post-token-refresh (and maybe a better-designed token pool) but for now, if the issue goes away after a reload, I'm happy

APT37 commented 1 month ago

Works for me.

Logs still show an error on page load, but that doesn't affect anything in the web app (as far as I can see).

ERROR redlib::utils > Error page rendered: Nothing here

EDIT: just got rate-limited - don't know if it's related.

Couldn't send request to Reddit: Rate limit - try refreshing soon | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL

stoopkidddd commented 1 month ago

@sigaloid not entirely related, but might want to look at the failing docker build action, otherwise this fix isn't going to make it out to everyone - https://github.com/redlib-org/redlib/actions/runs/10891397433

See here - https://github.com/redlib-org/redlib/issues/226

sigaloid commented 1 month ago

Done, thanks!

stoopkidddd commented 1 month ago

Thank you for the quick fix 🙏

r7l commented 1 month ago

Thanks allot! If this fix works, could you drop a new release?

edgar-vincent commented 1 month ago

The latest Docker image works for me. Thank you so much for the speedy fix!

sigaloid commented 1 month ago

Weird that it works. since the container build is still failing: https://github.com/redlib-org/redlib/actions/runs/10891683866

Gonna be a small fix to get a new container built from main - just some fixing of the filenames. I have class right now, if someone could look at .github/workflows/main-docker.yml:79 and see my recent commit (https://github.com/redlib-org/redlib/commit/118708400ad37c5ddfb8d54566d429c0f43b024a), and PR a fix, I could merge it. Else it'll probably be a few hours before I get to it.

edgar-vincent commented 1 month ago

My bad, sorry about that. The error did disappear for a moment, so I thought it was because of a new build, but it is now back again.

ButteredCats commented 1 month ago

The newest commit fixed the JSON errors, but I've just begun getting the ERROR redlib::utils > Error page rendered: Couldn't send request to Reddit: Rate limit - try refreshing soon error on both of my Redlib containers. After a restart of Redlib it only takes ~15 seconds to begin reappearing.

Greatz08 commented 1 month ago

I have same thing to report that after latest commit we are getting new type of error which is Rate limit error

Couldn't send request to Reddit: Rate limit - try refreshing soon | /r/popular/hot.json?&raw_json=1&geo_filter=GLOBAL

20240917_08h06m42s_grim

NOTE - I ALSO TRIED CLEANING COMPLETE CACHE WITH MY COOKIE MANAGER ADDON STILL SAME ISSUE :-))

Opening-Button-8988 commented 1 month ago

I pulled the latest docker image and I'm still having the same issue.

I am running on v35.1 which was released a month ago. Was the update rolled out on docker yet? As you can see no updates have been made since August: https://quay.io/repository/redlib/redlib?tab=tags

wuchyi commented 1 month ago

Weird that it works. since the container build is still failing: https://github.com/redlib-org/redlib/actions/runs/10891683866

Gonna be a small fix to get a new container built from main - just some fixing of the filenames. I have class right now, if someone could look at .github/workflows/main-docker.yml:79 and see my recent commit (1187084), and PR a fix, I could merge it. Else it'll probably be a few hours before I get to it.

I've submitted a PR #238, but to be frank, not entirely sure if it's what's required.

master-hax commented 1 month ago

this issue is harder to mitigate since https://github.com/redlib-org/redlib/pull/59 was merged and we can no longer easily do a build in docker

Tokarak commented 1 month ago

Please pin this issue, there is a slew of duplicates coming in

pimlie commented 1 month ago

~As mentioned by @ButteredCats, it seems the original fix does not really refresh the oauth token. Looks like that either refreshing the token in its own scope or launching a task to refresh the token works. See above PR.~

Never mind, that PR doesnt fix anything. It's still happening that even after refreshing the token you still get rate limited

Steinheausser commented 1 month ago

Hi, not sure if one more voice helps troubleshooting but I've just rebuilt and it fixed this issue. Am running it as an exe file on windows. However, this leads to issue #237.

perennialtech commented 1 month ago

As @pimlie mentioned in https://github.com/redlib-org/redlib/pull/241#issuecomment-2355568510, restarting Redlib seems to temporarily resolve the rate limiting issue, and frequent restarts (intervals of <10 minutes) could act as a stopgap fix until a more permanent solution is found.

CLKRUN commented 1 month ago

@perennialtech thanks, that helps for now.

sigaloid commented 1 month ago

The new error is intentional. It's meant to provide a better error message. But, it's supposed to be fixed temporarily after that since it rolls over the token. So constantly restarting the container really shouldn't be necessary - if it is, my fix wasn't sufficient.

If anyone wants to do more investigation and provide more info, that would be much appreciated. I see the notifications rolling in but it's very difficult to find free time to go in-depth. I always try to provide enough information when I do my own bug fixes to allow others to understand how it works.

wuchyi commented 1 month ago

I assume that since the currently-available docker container still works intermittently, it is a new “feature” from Reddit’s side that we will have to bypass - not a “fix” per se.

sigaloid commented 1 month ago

Docker container is now updated to contain the fix. Again, it really won't "fix" it, but it should reduce the number of reports since it rolls over the token immediately.

LucifersCircle commented 1 month ago

Docker container is now updated to contain the fix. Again, it really won't "fix" it, but it should reduce the number of reports since it rolls over the token immediately.

Working so far on my public instance, thank you!

edit: took about 5m to get the rate limit error but that seems like another issue.

sigaloid commented 1 month ago

Yeah, I'm not sure if this is a permanent fix, since https://github.com/redlib-org/redlib/issues/229#issuecomment-2354114934 mentions even with the new fix, the error still occurs repeatedly. That means the error response from Reddit seems to be more frequent than I first thought. I need to do a deep dive on this and find out if it is rooted in some separate new rate limit, or if it's completely random (in which case token roll over won't fix it, a retry mechanism might, but it also might spam it, causing even more aggressive blocking?). I need to just collect a ton of statistics about it, how frequently it occurs, if a retry mechanism would avoid it, if it's per-token, etc.

I worry that my "fix" wasn't a fix, it was just a glorified retry function.

LucifersCircle commented 1 month ago

If there’s any info I can provide to help I’d love to, however the only thing I know for certain is it happens fast, and a container restart can mitigate it shortly.

edit: checked logs and 7m went by until the first rate limit error. After restarting the container, the error occurred within 2m. Seems like you are right about it clearing cache and trying again.

sigaloid commented 1 month ago

I think post-restart working is just a side effect. I think that it really just clears the cache and tries again. Whereas if you don't restart, but disable caching on the main json function and then retry, I think it'll work the second time. Next up is avoiding a poisoned cache value.

thudeous commented 1 month ago

Just a datapoint: when this issue started happening yesterday, I replaced my private instance of redlib (docker arm) with the last version of libreddit (also docker arm) and that has been working just fine for the past day.

So: whatever libreddit is doing to grab its data, does not seem to trigger this new Reddit error response.

P.S. Thanks so much for libreddit/redlib!

argium commented 1 month ago

I loaded the homepage after restarting a container. Came back to it later and it was fudged. Assuming that it's not doing background work, that'd suggest a problem with caching. What, I'm not sure. I wouldn't expect redlib to hold onto anything for an extended period of time like this.

sigaloid commented 1 month ago

It sounds like it's specifically a new serverside issue when requesting it from the oauth routes. If we request it normally, I'm not sure if it will give us the same ratelimit bypass as now, since it won't look like a device.

mbestavros commented 1 month ago

It sounds like it's specifically a new serverside issue when requesting it from the oauth routes. If we request it normally, I'm not sure if it will give us the same ratelimit bypass as now, since it won't look like a device.

Would it be at all feasible to allow server admins to pick which route they request from?

sigaloid commented 1 month ago

Yeah absolutely, it would likely mean it's reverting to the standard routes' rate limits though, which are worse than OAuth. A retry system would probably be a better option since it really seems to be sporadic/random.

aurtzy commented 1 month ago

Following @thudeous's discovery, I've done a git bisect which seems to point to 3bb5dc5f3e5221149f72b10ae8c2104a02d3a0ac as the likely first commit with this issue. In my testing, 3bb5dc5 exhibited the behavior within 5 minutes of starting, whereas none of my test runs of redlib on the parent commit 3592728 so far have shown any error message of this sort (one of which had it running for about 2 hours).

sigaloid commented 1 month ago

Shocking. I could not tell you why that would be. But I can reproduce it and can confirm it's fixed.

What I don't know is why. That's just a dep bump to hyper-rustls and matching patch to fix compilation.

I see two possibilities:

It could possibly be some fingerprinting that certain Reddit edge servers have deployed, and the older version of hyper-rustls doesn't have that issue. I suspect this is it since it just started happening without a corresponding source change.
Or perhaps it's an issue with sporadic malformed HTTPS requests being generated by that version of hyper-rustls. It is 11 months old at this point and is outdated. But we can't upgrade it until we upgrade all of hyper, tokio, etc. Less likely but still possible and does still fit with the fact that it's a recent change - maybe certain Reddit edge servers have their HTTP parsing library updated to be more strict in what they accept, which would explain it happening all of a sudden.

Looking at hyper-rustls patch for that version, I don't see anything that jumps out at me. Maybe it's one of the dep bumps in this diff. https://github.com/rustls/hyper-rustls/compare/v%2F0.24.1...v%2F0.25.0

Either way, you just identified the issue. Thanks so much for bisecting, I didn't think to because I immediately assumed it was a serverside change exclusively so no code change could really cause it (barring the switched endpoints).

sigaloid commented 1 month ago

Fixed in 793047f63f0f603e342c919bbfc469c7569276fa, new container is out on quay.io: https://github.com/redlib-org/redlib/actions/runs/10925300119

Going to fix a few other things before cutting a new release (such as failing tests). But for now, this issue is done. Thanks to everyone here who helped investigate, especially @aurtzy for finding the issue.

All instance operators should prioritize pulling the new image with at least commit 793047f.

Tokarak commented 1 month ago

That's my commit 😍 I caused all of this

redlib-org / redlib