Closed alpaca2333 closed 4 years ago
pprof results: top:
flame:
i disabled the consistency check.
@alpaca2333 I am seeing the same or very similar issue. I was seeing my pushgateway climb in memory and eventually go OOM. I do see CLOSE_WAITS piling up little by little. What's the deal here?
@alpaca2333 yeah, I see what you mean about the consistency check. I shut it off as well via --push.disable-consistency-check
and my problem went away.
Still seems like a bug no?
@Drewster727 No, i think it is not a bug. CLOSE_WAIT is produced by servers not closing the connection in time. According to the above profiles, pushgateway cannot process requests as fast as they comes in. As mentioned in comments, the consistency check is very heavy. Each time you push metrics to pushgateway, the check will call gather.Gather(), in which it will sort.Sort() all your metrics you pushed. Just simply disable it.
Also, please consider re-architecting your setup. If you put sufficient load onto your Pushgateway to make the performance overhead of the consistency check matter, you are almost certainly using the Pushgateway for something it was not designed for. It might look things are just fine, but your setup is fundamentally brittle.
Inside a pushgateway pod, check with netstat:
pprof through
kubectl port-forward
:I in fact pushed large size of metrics to pushgateway from over 300 machines. They are consistent hashed to 30 pushgateways. However, the memory consumptions among them have huge difference. According to
wget -O - localhost:9091/metrics | wc -l
inside their pods, the amount of metrics on a pushgateway did not various a lot. Those who consumed a lot of memory, mostly had lots of unclosed connections as above.By the way, the client has a 1 second timeout context. so the client must have closed the connection.
Please help, thanks.