timescale / prometheus-postgresql-adapter

Use PostgreSQL as a remote storage database for Prometheus
Apache License 2.0
335 stars 66 forks source link

too many open files error #109

Closed mytxyang closed 4 years ago

mytxyang commented 4 years ago

I used the prometheus-postgresql-adapter to restore the data to postgresql for my prometheus cluster environment, it can work fine on X86 server. But when we deploy this environment on ARM server, it report:

2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 5ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 10ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 20ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 40ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 80ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 160ms
2020/03/14 18:40:52 http: Accept error: accept tcp [::]:9201: accept4: too many open files; retrying in 320ms

And when i check the open file handle on this container, i found the open file handle is too large (about 300k).

We use the same configuration for x86 and arm environment. Also It is unhelpful to downgrade the prometheus to v2.10.0.

We deploy the prometheus cluster on two nodes with docker-compose, each node start prometheus/alertmanager/prometheus-postgresql-adapter containers. Version: prometheus: v2.16.0 prometheus-postgresql-adapter: build the image from master branch

prometheus.yml:

remote_write:
  - url: "http://prometheus_postgresql_adapter:9201/write"

While monitor the open file handle, it continue to grow while prometheus send data to adapter. Could you please help to look this issue? thanks.

mytxyang commented 4 years ago

Find the root cause is the network is slow between adapter and database, close this issue.