Open sereinity opened 2 years ago
I enabled debugging while reproducing the issue. But the logs aren't very helpful:
time="2022-09-12T15:00:00Z" level=debug msg="Pool[tcp://127.0.0.1:9001/status]: {\"pool\":\"www\",\"process manager\":\"dynamic\",\"start time\":1662994578,\"start since\":222,\"accepted conn\":23,\"listen queue\":0,\"max listen queue\":0,\"listen queue len\":511,\"idle processes\":2,\"active processes\":0,\"total processes\":2,\"max active processes\":1,\"max children reached\":0,\"slow requests\":0, \"processes\":[{\"pid\":15,\"state\":\"Idle\",\"start time\":1662994578,\"start since\":222,\"requests\":12,\"request duration\":20914,\"request method\":\"GET\",\"request uri\":\"/index.php\",\"content length\":0,\"user\":\"-\",\"script\":\"/app/public/index.php\",\"last request cpu\":95.63,\"last request memory\":4194304},{\"pid\":16,\"state\":\"Idle\",\"start time\":1662994578,\"start since\":222,\"requests\":11,\"request duration\":20035,\"request method\":\"GET\",\"request uri\":\"/index.php\",\"content length\":0,\"user\":\"-\",\"script\":\"/app/public/index.php\",\"last request cpu\":99.83,\"last request memory\":4194304}]}" time="2022-09-12T15:00:00Z" level=debug msg="Updated 1 pool(s) in 2.813443ms"
I got only 12 of these logs on the failing instance, while on a normal fpm-instance I got one of these logs every scrap.
Hi there! We're experiencing the same issue on the latest (2.2.0) version. @sereinity have you sorted it out or maybe you've found a workaround?
@frvade sadly no; we added a limit on memory of 32Mi
to the phpfpm-exporter container. It prevents the huge memory usage, we get metrics back when it restarts, but we still miss a lots of metrics.
We also faced same issue.
Config:
Sometimes /metrics handle dies(by timeout), then php-fpm-exporter process growing RAM usage on each /metrics call until OOMKiller do it job.
I am using version 2.2.0, and indeed, I have reproduced this issue,
Same here, also using 2.2.0. Is there already a fix?
Hello,
First thank you for this great software that help us monitor our stacks.
Summary
Sometimes fpm-exporter doesn't work well and hang when queriying
/metrics
appearing as an unhealthy target in prometheus.Context
We use the exporter as a sidecar on a kubernetes engine. We have 16 fpm-exporter per environment with the exact same configuration. But some (one or two) of them doesn't behave correctly:
Symptoms
When we query the exporter on the root (
curl localhost:9253/
) they responds that the metrics lives in/metrics
but when querying it it never responds; and over time those instances take more and more RAM (linearly, I didn't put memory limits yet).Killing fpm or fpm-exporter (running
kill 1
in containers) may fix it (I say may because there is still a bigger memory usage that clean instances and it looks like those instances may be more prone to hang again)Here are CPU and memory graphs of a clean instance
Here are CPU and memory graphs of an unhealty instance
Timeline:
Note: there is absolutely no relevant logs, only
connection refused
when restarting fpm. (I plan to test with more verbosity).Conf sample
What can I do?
We can see that healthy instances takes less than 10Mb of memory, but even after a restart, the previously unhealthy instance still takes 60Mb; I would love to put a memory limit, but I unsure about setting it on something like 32Mb.
I could also put the liveness on
/metrics
but unsure if it's a good idea.Ready to read your propositions folks :slightly_smiling_face: