Closed chet closed 3 years ago
Thanks for the report! I have not heard about this library slowing down worker start-up before.
I assume what you mean by 'dynamic environment' is that you regularly change nginx configuration and reload the process by sending the HUP
signal, which starts a bunch of new workers.
The only operation that I believe could be causing delays if new workers start while a running nginx
process has already accumulated a very large number of metrics is probably syncing per-worker metric name cache (implemented in 9ec4a6259ec78dcbdfe1ca36272135978eb80c56). But I am not sure how moving init
back to init_by_lua_block
would fix that.
To answer your questions:
init_by_lua_block
, and only call init_worker
from init_worker_by_lua_block
. This should still work now, but I don't think we have any tests to validate this behaviour, so I guess it could break in the future.Hey @chet, do you happen to be in a position to provide more details about this issue? I would like to understand whether there are any changes (to code or documentation) I should make.
Seeing no response from @chet, I don't think this is actionable at this point, so will have to close this issue. Please feel free to reopen if you can provide more details.
Hey!
We've been troubleshooting some intermittent request latencies, and added some timing, which ultimately let us track it down to having moved away from the deprecated usage of initializing metrics in
init_by_lua_block
, and over toinit_worker_by_lua_block
.As we store more and more metrics, doing the
init
call withininit_worker_by_lua_block
becomes slower and slower, causing north of 5 second request-blocking delays when a new nginx worker is spanwed off. In a highly dynamic environment (which nginx can only be so good at), we see this unfortunately pretty frequently, and customers see this as ~5 second delays in making connections.Moving our
require("prometheus").init("prometheus_metrics")
line back toinit_by_lua_block
has fixed the issue, but since it's deprecated, I'm concerned this will come to bite us later, and we'll have to put it back as a worker-blocking initialization call (which isn't going to really work for us at this point).FWIW, I'm more just curious: 1) If you were aware, because there's a good chance this doesn't come up a lot; not judging at all. 2) If it's ok if we keep it in
init_by_lua_block
? 3) If, maybe, there are some perf things you can think of off the top of your head to improve this?But, above all else, I mainly just wanted you to realize it's a thing. We love this module and how easy it is to use, but I don't think we'll be using nginx for much longer at at this point; it's been years of trying to hack around shortcomings running in dynamic environments, and its exhausting!