snarfed / bridgy-fed

🌉 A bridge between decentralized social network protocols
https://fed.brid.gy
Creative Commons Zero v1.0 Universal
562 stars 30 forks source link

atproto firehose throws errors #1059

Closed mackuba closed 3 months ago

mackuba commented 4 months ago

When trying to connect to the firehose and read some events from it, I sometimes get errors, which look for me like this:

#<Faye::WebSocket::API::ErrorEvent:0x0000000106d44318>

This is even after I started receiving some data - events are coming, then they stop, nothing happens for a bit, and then error, maybe more errors, and then after a while it continues printing events.

I inspected the details of the error, and it looks like the server is sometimes returning status 502:

@message="Error during WebSocket handshake: Unexpected response code: 502",
@pathname="/xrpc/com.atproto.sync.subscribeRepos?cursor=124415",
@status=502

@headers={
      "content-type"=>"text/html; charset=UTF-8",
      "referrer-policy"=>"no-referrer",
      "content-length"=>"332",
      "date"=>"Fri, 17 May 2024 20:35:11 GMT",
      "connection"=>"close"
    },

@buffer="\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"
snarfed commented 4 months ago

Ugh, thanks, not ideal. Will look.

snarfed commented 4 months ago

I've made some significant infra improvements over the last couple days, including to the (outgoing) firehose's stability, and I'm not seeing it serve these 502s any more. @mackuba hopefully you're seeing the same thing now?

mackuba commented 4 months ago

Just tried it but still not looking great:

Screen Shot 2024-05-24 at 21 59 51

snarfed commented 4 months ago

Hmm! That's surprising. My logs show that BF was serving them pretty consistently for a while, but they stopped altogether a few days ago. The last time I see that we served a 502 to subscribeRepos was 5/21 07:39:04 UTC.

image
mackuba commented 4 months ago

Could it be served by something before your app (Nginx or whatever) when it can't get a response from your app?

snarfed commented 4 months ago

Yup, definitely. It currently restarts about once a day (for reasons 😐), and any open connections when that happens get 502ed. Looks like you caught it at one of those restarts. Try again? Apart from those, it's not serving many/any other 502s as far as I can tell.

mackuba commented 4 months ago

Hmm… looks good at the moment, but I'll try a few more times at different times :)

mackuba commented 4 months ago

Btw, what's the buffer size for your firehose? It looks like it goes back <24h?

snarfed commented 4 months ago

5000 seqs, but now that it's behaving better I should probably drop that limit and allow full history.

snarfed commented 3 months ago

Hi again! I fixed #1091 today, which should stop the daily restarts, so I think these 502 errors should be fully gone now. Websocket subscriptions will still close with HTTP 101 after 1 hour, but that seems reasonable, clients should hopefully handle that and reconnect ok.

Let me know if you still see this problem!