serenity-rs / poise

Discord bot command framework for serenity, with advanced features like edit tracking and flexible argument parsing
MIT License
624 stars 111 forks source link

Bot is unable to start, blocks inside framework::build #198

Closed hyperpastel closed 1 year ago

hyperpastel commented 1 year ago

Hello all,

I wrote a bot using poise, and it works on my machine, but it behaves abnormally when I run it on my VPS.

When I call FramworkBuilder::build(), that method gets stuck when reqwest is trying to load the application info (RouteInfo::GetCurrentApplicationInfo).

I use tokio with multiple tasks. All of them start fine, including the one for the bot, until it calls this functions. The weird thing is that it compiles just fine, no complaints from cargo or rustc, and also no crashes. It just... waits infinitely? I turned on env_logger with the level set to trace, which yielded the following.

[2023-09-02T15:17:11Z INFO  serenity::http::client] request; self=Http { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, ratelimiter: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, ratelimiter_disabled: false, proxy: None } req=Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo }
[2023-09-02T15:17:11Z INFO  serenity::http::ratelimiting] perform; self=Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } } req=RatelimitedRequest { req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo } }
[2023-09-02T15:17:11Z INFO  serenity::http::ratelimiting] pre_hook; self=Ratelimit { limit: 9223372036854775807, remaining: 9223372036854775807, reset: None, reset_after: None } route=GetCurrentApplicationInfo
[2023-09-02T15:17:11Z TRACE tracing::span] -- pre_hook;
[2023-09-02T15:17:11Z INFO  serenity::http::request] build; self=Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo } client=Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} } proxy=None
[2023-09-02T15:17:11Z TRACE tracing::span] -- build;
[2023-09-02T15:17:11Z TRACE hyper::client::pool] checkout waiting for idle connection: ("https", discord.com)
[2023-09-02T15:17:11Z DEBUG reqwest::connect] starting new connection: https://discord.com/
[2023-09-02T15:17:11Z TRACE hyper::client::connect::http] Http::connect; scheme=Some("https"), host=Some("discord.com"), port=None
[2023-09-02T15:17:11Z DEBUG hyper::client::connect::dns] resolving host="discord.com"

Running with tracing and tracing_subscriber yields

 2023-09-02T15:36:38.305727Z TRACE hyper::client::pool: checkout waiting for idle connection: ("https", discord.com)
    at /home/lumine/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.27/src/client/pool.rs:638
    in serenity::http::ratelimiting::perform with self: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, req: RatelimitedRequest { req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo } }
    in serenity::http::client::request with self: Http { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, ratelimiter: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, ratelimiter_disabled: false, proxy: None }, req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo }

  2023-09-02T15:36:38.305989Z DEBUG reqwest::connect: starting new connection: https://discord.com/
    at /home/lumine/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.11.20/src/connect.rs:429
    in serenity::http::ratelimiting::perform with self: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, req: RatelimitedRequest { req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo } }
    in serenity::http::client::request with self: Http { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, ratelimiter: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, ratelimiter_disabled: false, proxy: None }, req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo }

  2023-09-02T15:36:38.306197Z TRACE hyper::client::connect::http: Http::connect; scheme=Some("https"), host=Some("discord.com"), port=None
    at /home/lumine/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.27/src/client/connect/http.rs:278
    in serenity::http::ratelimiting::perform with self: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, req: RatelimitedRequest { req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo } }
    in serenity::http::client::request with self: Http { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, ratelimiter: Ratelimiter { client: Client { accepts: Accepts, proxies: [Proxy(System({}), None)], referer: true, default_headers: {"accept": "*/*"} }, global: Mutex { data: () }, routes: RwLock { data: {} } }, ratelimiter_disabled: false, proxy: None }, req: Request { body: None, multipart: None, headers: None, route: GetCurrentApplicationInfo }

  2023-09-02T15:36:38.307008Z DEBUG hyper::client::connect::dns: resolving host="discord.com"
    at /home/lumine/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.27/src/client/connect/dns.rs:122

I have tried putting this application into a docker container, only to find myself in the exact same scenario again: It works on my machine, but it hangs when ran on my VPS (even using containers!).

I have tried to do println debugging via ssh and got this: The error right now happens when serenity's Ratelimiter performs the RouteInfo::GetCurrentApplicationInfo request inside the Framework builder. This makes an HTTP request via reqwest / hyper, and the deepest I've been able to track the halting point down to is this block of code, which execution will not be able to get past.

https://github.com/hyperium/hyper/blob/d342c2c714498d33891fa285a3c9ae991dc34769/src/client/client.rs#L472

Paradoxically, neither hyper nor reqwest seems to be the fundamental problem here, as I was able to successfully make an HTTP request towards Discord's API with the following code (this worked locally and on my server).

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::builder().build()?;

    let response = client
        .get("https://discord.com/api/v/10")
        .send()
        .await?
        .text()
        .await?;

    println!("Response: {response}");

    Ok(())
}

Cargo.toml of the executable

[package]
name = "bot"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
reqwest = { version = "0.11.20"}
dotenv = "0.15.0"
ipc = { path = "../ipc" }
once_cell = "1.18.0"
poise = "0.5.5"
regex = "1.9.4"
serde = "1.0.183"
serde_json = "1.0.105"
thiserror = "1.0.46"
tokio = { version = "1.32.0", features = ["rt-multi-thread", "macros"] }
tracing = "0.1.37"
tracing-subscriber = "0.3.17"

Code that starts the bot

    let config = app::get_config(); // Loaded in at runtime, contains token

    let options = poise::FrameworkOptions {
        commands: vec![commands::avatar()],
        on_error: |error| Box::pin(on_error(error)),
        ..Default::default()
    };

    let intents =
        serenity::GatewayIntents::non_privileged() | serenity::GatewayIntents::MESSAGE_CONTENT;

    use poise::serenity_prelude as serenity;

    let framework = poise::Framework::builder()
        .token(config.token.clone())
        .setup(|ctx, _ready, framework| {
            Box::pin(async move {
                println!("Logged in as {}", _ready.user.name);
                poise::builtins::register_in_guild(
                    ctx,
                    &framework.options().commands,
                    _ready.guilds.get(0).unwrap().id,
                )
                .await?;
                Ok(Data {})
            })
        })
        .options(options)
        .intents(intents)
        .build()
        .await;

    let Ok(framework) = framework else {
        eprintln!("Failed to setup framework");
        return Ok(());
    };

    let http = framework.client().cache_and_http.http.clone();
    app::init_http(http).map_err(|_| generic("Http"))?; // Http is shared with other threads 

    framework.start().await.unwrap();

Things I tried:

kangalio commented 1 year ago

Thank you for this detailed bug report and the huge amount of work you invested into debugging this obscure problem already

I don't know what is happening either... the symptoms remind me of the async runtime being blocked? I see you enabled "rt-multi-thread" but maybe the VPS only has one core

Can you try disabling rt-multi-thread on your dev machine? Maybe it will reproduce the infinite wait

hyperpastel commented 1 year ago

Dear kangalio,

You are an angel.

After disabling rt-multi-thread and setting #[tokio::main(flavor = "current_thread")] on my dev machine, I was indeed able to reproduce the error locally!

The docs for tokio::main tell that when no flavor is set, tokio will default to the multi thread flavor with the amount of worker threads equal to the machine's cores. In the case of my VPS, this is - as you correctly assumed - one core, making it equivalent to the current_thread flavor. This was apparently not enough to handle the concurency needed during dns resolution.

After manually setting the amount of worker threads to 2 by changing the macro to

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]

I was finally able to get it to work.

Thank you for maintaining this project and helping me ♥