jeremydaly / serverless-mysql

A module for managing MySQL connections at SERVERLESS scale
MIT License
1.21k stars 83 forks source link

Error: ER_NO_SUCH_THREAD: Unknown thread id #127

Closed kevinhankens closed 1 year ago

kevinhankens commented 2 years ago

Greetings! I am seeing the error Error: ER_NO_SUCH_THREAD: Unknown thread id: xxxxxxx in our function traces. This seems to be coming from the zombie cleanup and feels like a race between several function calls, but I'm unsure how to approach the fix.

It seems possible that:

  1. I am incorrectly using end() vs quit() at the end of the function call
  2. I shouldn't be using cfg.manageConns for every function call
  3. My configuration isn't optimized
  4. I'm instantiating the client inside of the function, not reusing for warm starts, which could be problematic

There are tons of Sleep commands in information_schema.processlist, so it seems plausible that in high traffic situations many functions are attempting to clean up at the same time. My instinct is to try and choose a sample size for cleanup. Perhaps, only setting cfg.manageConns for like 10% of the connections. Alternatively, perhaps calling quit() for most functions, but end() for a smaller number would work.

If anyone has seen this, or has any advice, I would love to hear from you.

Thanks kindly!

naorpeled commented 1 year ago

Hey @kevinhankens, thanks for opening this issue and sorry for the delayed response.

Have you managed to resolve this? πŸ™

kevinhankens commented 1 year ago

@naorpeled Hello! I believe that doing a sample-rate cleanup helped. I wasn't able to prove it, but it certainly felt like there was a race to close zombie connections. We have something like the following now and haven't seen that error any more. I'm not sure if this is a hack or not, but we are also not seeing zombie connections, so it feels ok. Your mileage may vary πŸ˜†

        // WARNING: This is probably not recommended:
        try {
            // In roughly 20% of the cases we will attempt to clean up any zombie connections.
            const sample: number = Math.random();
            const clean: number = .2;
            if (sample < dbConnCleanupSample) {
                await yourConnection.end();
            }
        } catch (error) {
            console.error(error);
        }
naorpeled commented 1 year ago

@naorpeled Hello! I believe that doing a sample-rate cleanup helped. I wasn't able to prove it, but it certainly felt like there was a race to close zombie connections. We have something like the following now and haven't seen that error any more. I'm not sure if this is a hack or not, but we are also not seeing zombie connections, so it feels ok. Your mileage may vary πŸ˜†

        // WARNING: This is probably not recommended:
        try {
            // In roughly 20% of the cases we will attempt to clean up any zombie connections.
            const sample: number = Math.random();
            const clean: number = .2;
            if (sample < dbConnCleanupSample) {
                await yourConnection.end();
            }
        } catch (error) {
            console.error(error);
        }

I see, it is indeed a hack but I'm glad that it did the job for you guys. You can also set the usedConnsFreq option in order to achieve a similar result.

Will close this issue and mark it as completed for now. Feel free to ping me if you want me to re-open it or if you have any additional questions.