Open sevagh opened 6 years ago
Hi @sevagh,
Thanks for the report. Afraid I have not seen this before. Will dig into it when I get a chance. What config are you running it with?
I was running it with the default settings. Seems like it felt a bottleneck with elasticsearch, and the Prometheus remote storage queue created too many shards/goroutines.
Today, I ran it with some settings:
Environment="ES_WORKERS=4"
Environment="ES_BATCH_COUNT=-1"
Environment="ES_BATCH_SIZE=-1"
Environment="ES_BATCH_INTERVAL=30"
Now there's no connections piling up. (closed by accident and re-opened)
What settings did you have when you experienced the issues?
Before:
ExecStart=/usr/local/bin/prometheus-es-adapter \
--es_url=https://myelastic:9200 \
--es_user=local_prometheus-adapter \
--listen 0.0.0.0:9201
I didn't modify the default values for:
ES_WORKERS | 0 | Number of batch workers
ES_BATCH_COUNT | 1000 | Max items for bulk Elasticsearch insert operation
ES_BATCH_SIZE | 4096 | Max size in bytes for bulk Elasticsearch insert operation
ES_BATCH_INTERVAL | 10 | Max period in seconds between bulk Elasticsearch insert operations
I think this is a case of "I should not rely on defaults in production" - user error.
I don't believe we should write this off as user error. Not sure when I will get to them but a few thoughts:
I'll see about replicating the old bad settings and trace it to a line of code here.
@sevagh how did you get on?
Oops, never really revisited this. I promise on Monday I'll try to recreate.
On the other hand, with the production configuration, this adapter has been running without a crash since basically May. Really great work here.
Hello,
I'm running this adapter. Even with a high ulimit (131072), it seems to be leaking connections:
I'm trying to find out where this is occurring - perhaps in the Elastic client you use, perhaps in the http server in this adapter.
Have you seen any behavior like this?