Closed kevinhankens closed 1 year ago
Hey @kevinhankens, thanks for opening this issue and sorry for the delayed response.
Have you managed to resolve this? π
@naorpeled Hello! I believe that doing a sample-rate cleanup helped. I wasn't able to prove it, but it certainly felt like there was a race to close zombie connections. We have something like the following now and haven't seen that error any more. I'm not sure if this is a hack or not, but we are also not seeing zombie connections, so it feels ok. Your mileage may vary π
// WARNING: This is probably not recommended:
try {
// In roughly 20% of the cases we will attempt to clean up any zombie connections.
const sample: number = Math.random();
const clean: number = .2;
if (sample < dbConnCleanupSample) {
await yourConnection.end();
}
} catch (error) {
console.error(error);
}
@naorpeled Hello! I believe that doing a sample-rate cleanup helped. I wasn't able to prove it, but it certainly felt like there was a race to close zombie connections. We have something like the following now and haven't seen that error any more. I'm not sure if this is a hack or not, but we are also not seeing zombie connections, so it feels ok. Your mileage may vary π
// WARNING: This is probably not recommended: try { // In roughly 20% of the cases we will attempt to clean up any zombie connections. const sample: number = Math.random(); const clean: number = .2; if (sample < dbConnCleanupSample) { await yourConnection.end(); } } catch (error) { console.error(error); }
I see, it is indeed a hack but I'm glad that it did the job for you guys. You can also set the usedConnsFreq
option in order to achieve a similar result.
Will close this issue and mark it as completed for now. Feel free to ping me if you want me to re-open it or if you have any additional questions.
Greetings! I am seeing the error
Error: ER_NO_SUCH_THREAD: Unknown thread id: xxxxxxx
in our function traces. This seems to be coming from the zombie cleanup and feels like a race between several function calls, but I'm unsure how to approach the fix.It seems possible that:
end()
vsquit()
at the end of the function callcfg.manageConns
for every function callThere are tons of
Sleep
commands ininformation_schema.processlist
, so it seems plausible that in high traffic situations many functions are attempting to clean up at the same time. My instinct is to try and choose a sample size for cleanup. Perhaps, only settingcfg.manageConns
for like 10% of the connections. Alternatively, perhaps callingquit()
for most functions, butend()
for a smaller number would work.If anyone has seen this, or has any advice, I would love to hear from you.
Thanks kindly!