sm00th / bitlbee-discord

Bitlbee plugin for Discord (http://discordapp.com)
GNU General Public License v2.0
291 stars 27 forks source link

Backlog fetching blocks Discord websocket heartbeat, Bitlbee response to PING #223

Closed digitalcircuit closed 11 months ago

digitalcircuit commented 2 years ago

In brief

Test case

Steps

  1. Set up a Discord account connected to many channels/group chats/etc
    • The number of users doesn't matter, this is more about calling discord_http_get_backlog often
    • Currently around 130 channels/chats here (select COUNT(*) from buffer where buffername like '#%.%' and joined = true;)
  2. Ensure max_backlogs is set to default 50
  3. Connect to Bitlbee if not already connected
  4. Connect to the Discord account
  5. Observe results

Expected

bitlbee-discord connects, however slowly it needs to.

Actual

In the past week (2021-8-14 or earlier), Discord now throttles serving responses to the discord_http_get_backlog() call to ≥1s each, slowing down login to the point Quassel times out and disconnects (180 seconds).

If I disable Quassel's timeout, then Discord times bitlbee-discord's websocket out instead:

discord - Remote host is closing websocket connection
discord - Error: Invalid session, reconnecting

(Note that once I disabled the backlog fetching, I logged in just fine - no issues with token, etc.)

Workaround

Disable backlog fetching with account discord_acct set max_backlog 0.

Notes

I suspect this would involve some sort of queue system within bitlbee-discord in order to not block Bitlbee processing nor block responding to Discord's heartbeat, similar to how IRC clients queue up WHO requests.

This might be a lot of effort.

sm00th commented 2 years ago

Thanks for the writeup. I wonder where exactly it get's stuck. bitlbee-discord only calls discord_http_get_backlog() on channel join. Now the joins are handled by bitlbee itself in irc_channel_auto_joins() and that is a tight loop, but discord_http_get_backlog() only sends a request and handles the reply in a callback so it should not be blocking, I think.

digitalcircuit commented 2 years ago

Hmm, good question. I didn't see any obvious blocking either, though I didn't look thoroughly.

From a combination of the Bitlbee wiki and an Internet search, it looks like I might be able to run something like…

sudo --user bitlbee valgrind --log-file=valgrind.log --tool=callgrind bitlbee -Dnv

Or using perf, perhaps:

sudo --user bitlbee perf record bitlbee -Dnv

I'll try to test this soon (part of this comment is a note to myself). I'm wondering if it's possible to recreate this more easily by artificially slowing network requests, or if my assumption is off base (it might not be network related).

sm00th commented 2 years ago

bitlbee-discord also has a lot of debug messages when BITLBEE_DEBUG env var is set, so running biltlbee like BITLBEE_DEBUG=1 bitlbee -Dnv might give you a better idea of what is going on at the time of timeout. It might be so that doing 130 requests in a loop is enough slowdown to trigger a timeout. In that case the fix will need to be on bitlbee's side but I am not sure whether it will be worth it as that many joined channels is hardly a normal usecase, I think most users don't use bitlbee-discord in auto_join mode and only add a couple of dozens channels they are interested in.

digitalcircuit commented 2 years ago

Good news-ish - I'm able to re-enable backlog fetching now that I actually need it after recent Discord changes to reset passwords of accounts using bitlbee-discord/etc. There's a bunch of 403 errors, but I always had that, so maybe it partially fetches..?

I'm not sure if we should close this issue for now - I could always re-open it later?

digitalcircuit commented 11 months ago

I haven't been able to recreate this issue in day-to-day use for the past few months, so I think I can just close this.

Feel free to re-open (or others ask me to re-open) if others run into it!