interactions-py / interactions.py

A highly extensible, easy to use, and feature complete bot framework for Discord
https://interactions-py.github.io/interactions.py/
MIT License
838 stars 184 forks source link

[BUG] raw_guild_create event on reconnect causes hang/memory leak #1662

Closed Scrxtchy closed 6 months ago

Scrxtchy commented 6 months ago

Library Version

userapps @ 73c73fd

Describe the Bug

Since upgrading to python 3.12, I've witnessed an error regarding one of my apps increasing in memory usage in the early hours of the morning. In the midst of debugging another issue, attaching the logger pointed it to an issue where a long running web request is halting the main thread causing the bot to disconnect from the gateway. Upon reconnection, it goes through it's normal operation of dispatching raw_guild_create events, however it is only able to complete one event before hanging, when it is expected to do 7 events when compared to initial startup.

if I put a return statement at the beginning of the _on_raw_guild_create function, the issue is not replicated.

Steps to Reproduce

  1. Start Bot
  2. Trigger disconnect from gateway
  3. Issue is observed

Expected Results

Events should be dispatched as normal without issue

Minimal Reproducible Code

import interactions as slash
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
import time

bot = slash.Client(token="")

@slash.slash_command(
    name="memoryleak",
    description="expand",
    default_member_permissions="0")
async def memoryleak(ctx: slash.InteractionContext):
    await ctx.send("sleeping for 60s", ephemeral=True)
    time.sleep(60)
    print("done")

@slash.listen()
async def on_startup():
    print("started")

bot.start()

Traceback

No traceback, debug logs;


DEBUG:Dispatching Event: command_completion
INFO:Command Called: roundrefresh with event.ctx.args = [] | event.ctx.kwargs = {}
DEBUG:Disconnecting from gateway! Reason: 1000::
DEBUG:Sending data to websocket: {"op": 1, "d": 10}
DEBUG:Shard ID 0 | Identification payload sent to gateway, requesting intents: 3374845
DEBUG:Shard ID 0 | ❤ Gateway is sending a Heartbeat
DEBUG:Shard ID 0 | ❤ Received heartbeat acknowledgement from gateway
INFO:Shard ID 0 | Gateway connection established
DEBUG:Shard ID 0 | Session ID: 84030d0c0b5ffe0bf4d7605a27402cc2 Trace: ['["gateway-prd-us-east1-c-vx34",{"micros":53695,"calls":["id_created",{"micros":600,"calls":[]},"session_lookup_time",{"micros":7059,"calls":[]},"session_lookup_finished",{"micros":15,"calls":[]},"discord-sessions-prd-1-221",{"micros":45680,"calls":["start_session",{"micros":36503,"calls":["discord-api-rpc-77cfbcdbdd-6tp8q",{"micros":33313,"calls":["get_user",{"micros":4745},"get_guilds",{"micros":2603},"send_scheduled_deletion_message",{"micros":6},"guild_join_requests",{"micros":163},"authorized_ip_coro",{"micros":6}]}]},"starting_guild_connect",{"micros":58,"calls":[]},"presence_started",{"micros":334,"calls":[]},"guilds_started",{"micros":162,"calls":[]},"guilds_connect",{"micros":2,"calls":[]},"presence_connect",{"micros":8595,"calls":[]},"connect_finished",{"micros":8600,"calls":[]},"build_ready",{"micros":17,"calls":[]},"clean_ready",{"micros":1,"calls":[]},"optimize_ready",{"micros":0,"calls":[]},"split_ready",{"micros":1,"calls":[]}]}]}]']
DEBUG:Dispatching Event: websocket_ready
DEBUG:Dispatching Event: raw_guild_create
Terminated

Checklist

Additional Information

in trying to reproduce this in another bot, it didn't log the event I believed to have issues, instead there's nothing printed after the websocket_ready function, and the resident ram of this process was observed to be increasing, so there's a chance I'm climbing the wrong tree here image

AstreaTSS commented 6 months ago

Interesting. I'm curious if this has to do with #1657.

Scrxtchy commented 6 months ago

I didn't start observing this at 3.12.3 but rather just after upgrading to an unknown version of 3.12, so I have not considered the changes to enum values being too relevant. My speculation in the past has been the cache as my thought was "this runs fine on first run, but what if running it the second time is the isssue" but I haven't taken the time to look at that yet. But given the unresponsive nature of the program, I've been trying to understand the event loop more.

AstreaTSS commented 6 months ago

Managed to replicate this on 3.12.3. It's actually far easier to just use await bot.ws.reconnect(resume=False) (technically, using no arguments would be fine, but this highlights how important it is) to trigger it.

Scrxtchy commented 6 months ago

Thanks, this has helped cut the time down a lot. I was actively looking at this again when I noticed the update to the thread what I've found that the problematic line seems to be in here https://github.com/interactions-py/interactions.py/blob/1a7de56b2085bdd62be77bfbb0f12506982d0858/interactions/api/events/processors/guild_events.py#L38-L42

specifically, the set() function. as putting a return on line 40 does not reproduce the issue, but one on line 42 will. But I have to reconnect not once, but twice for it to happen

AstreaTSS commented 6 months ago

Found the exact issue. I'll detail it in a bit in a PR to fix it, but in short, I genuinely have no idea how reconnects were functioning up until now.

AstreaTSS commented 6 months ago

See #1663 for a fix.