timescale / prometheus-postgresql-adapter

Use PostgreSQL as a remote storage database for Prometheus
Apache License 2.0
335 stars 66 forks source link

Postgresql-adapter Dial tcp i/o timeout error #75

Open gorwell4891 opened 5 years ago

gorwell4891 commented 5 years ago

When running:

$ docker run -it --name prometheus_postgresql_adapter --link pgprometheus -p 9201:9201 timescale/prometheus-postgresql-adapter:latest -pg.password=xxxxxx -pg.host=pg_prometheus -pg.prometheus-log-samples

I get the following error:

level=info ts=2019-04-30T02:46:57.114577849Z caller=log.go:25 config="&{remoteTimeout:30000000000 listenAddr::9201 telemetryPath:/metrics pgPrometheusConfig:{host:pgprometheus port:5432 user:postgres password:xxxxxx database:postgres schema: sslMode:disable table:metrics copyTable: maxOpenConns:50 maxIdleConns:10 pgPrometheusNormalize:true pgPrometheusLogSamples:true pgPrometheusChunkInterval:43200000000000 useTimescaleDb:true dbConnectRetries:0 readOnly:false} logLevel:debug haGroupLockId:0 restElection:false prometheusTimeout:-1}" level=info ts=2019-04-30T02:46:57.114774342Z caller=log.go:25 msg="host=pgprometheus port=5432 user=postgres dbname=postgres password='xxxxxx' sslmode=disable connect_timeout=10" level=error ts=2019-04-30T02:47:07.114948565Z caller=log.go:33 err="dial tcp: i/o timeout"

I followed the instructions exactly. pg_prometheus and prom/prometheus run fine. The adapter encounters the dial tcp i/o timeout.

gorwell4891 commented 5 years ago

I wanted to add a bit more information here about my firewall. I only allow my private static IP addresses to access docker0. All other packets are dropped.

-i docker0 -m set --match-set privateip dst -j ACCEPT -i docker0 -j DROP

The private static ip allowed for docker0 are by the host external ip, internal router ip, localhost and 172.17.0.0-172.17.0.255 which seem to be the ip range of what docker uses.

1 ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:postgresql

When I remove the DROP rule, the prometheus-postgresql-adapter runs without timing out.

level=info ts=2019-04-30T03:54:47.854530746Z caller=log.go:25 msg="Initialized pg_prometheus extension" level=warn ts=2019-04-30T03:54:47.856933032Z caller=log.go:29 msg="No adapter leader election. Group lock id is not set. Possible duplicate write load if running adapter in high-availability mode" level=info ts=2019-04-30T03:54:47.858181273Z caller=log.go:25 msg="Starting up..." level=info ts=2019-04-30T03:54:47.85950931Z caller=log.go:25 msg=Listening addr=:9201

What static IP or routing would be required for the adapter to connect to the postgresql db other than what is shown above?

Because if the packet is coming from one of my listed static IP addresses the firewall will accept it, then there shouldn't be a problem. But there appears to be some IP address that is being used to connect that is not in my list.

gorwell4891 commented 5 years ago

Sorry, I thought we had fixed the issue, but we are still getting the same problem. So need to ask for advice and reopen this.

We simplified the firewall rule to:

-A DOCKER-USER -m set ! --match-set privateip dst -j DROP

So that if the incoming IP address doesn't match our ipset, then drop the packet.

Everyone of our other dockers run with this rule turned on as expected. Nearly a dozen docker containers running on the same server.

However, the prometheus-postgresql-adapter just does not run and gives a dial tcp i/o timeout error with this rule.

The privateip addresses are our localhost, host and range of docker internal ip addresses. So it works for every other docker container we've run.

Is there some telemetry or outside ip that needs to check in on the adapter before it can start that could block it if an IP address is not added to our private ip list?

Thank you.

gorwell4891 commented 5 years ago

I think I might have found out the reason by reviewing the code. Does the adapter need to import and get code from github.com when running the docker? Does this require DNS lookups from the adapter's docker?

In the file: main.go

import (
    "flag"
    "io/ioutil"
    "net/http"
    _ "net/http/pprof"
    "os"
    "sync/atomic"
    "time"

    "github.com/timescale/prometheus-postgresql-adapter/log"

    "github.com/timescale/prometheus-postgresql-adapter/postgresql"
    "github.com/timescale/prometheus-postgresql-adapter/util"

    "github.com/gogo/protobuf/proto"
    "github.com/golang/snappy"
    "github.com/jamiealquiza/envy"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/common/model"

    "database/sql"
    "fmt"

    "github.com/prometheus/client_model/go"
    "github.com/prometheus/prometheus/prompb"
)
bboule commented 5 years ago

@gorwell4891 it looks like you are defining your postgres host as pg_prometheus (you are pasing that in as your -pg-host value can I assume that you can resolve that host name when you ping from another container in the environment, and that you are able to connect from a postgres client? Have you tried (assuming everything is running on the same host) trying to pass 'localhost' for your pg-host? give that a shot and if possible grab up a fresh set of logs!