prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.68k stars 2.16k forks source link

docker image (prom/alertmanager) doesn't accept urls different than 127.0.0.1 for webhooks #2399

Closed sergiodj closed 4 years ago

sergiodj commented 4 years ago

What did you do?

I'm using the prom/alertmanager docker image in order to set up a docker compose here. For now, the compose will contain the alertmanager image, and a very simple test container whose job is just to listen to port 5001 and wait for a webhook call.

This is my alertmanager.yml:

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'webhook-test'

receivers:
- name: 'webhook-test'
  webhook_configs:
  - url: 'http://tests:5001/'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

I'm setting up docker-compose using:

...
    alertmanager:
        image: prom/alertmanager
        ports:
            - 9093:9093
        volumes:
            - ./config/alertmanager.yml:/etc/prometheus/alertmanager.yml

    tests:
        image: mytest/test
        ports:
            - 5001:5001

According to docker-compose's documentation, the alertmanager container should be able to communicate with the tests container through the internal network just fine.

What did you expect to see?

I expected to see the alertmanager container sending the webhook calls to the tests container.

What did you see instead? Under which circumstances?

The alertmanager container keeps trying to send the webhooks to 127.0.0.1, in spite of the current configuration. Here's what I see in the logs:

...
alertmanager_1   | level=error ts=2020-10-22T14:38:56.339Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="web.hook/webhook[0]: notify retry canc
eled after 8 attempts: Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
alertmanager_1   | level=warn ts=2020-10-22T14:38:56.341Z caller=notify.go:674 component=dispatcher receiver=web.hook integration=webhook[0] msg="Notify attempt failed, will retry later" att
empts=1 err="Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
alertmanager_1   | level=warn ts=2020-10-22T14:39:01.223Z caller=notify.go:674 component=dispatcher receiver=web.hook integration=webhook[0] msg="Notify attempt failed, will retry later" att
empts=1 err="Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
...

Environment

I'm using Ubuntu Focal, docker 19.03.8 and docker-compose 1.25.0

Let me know if you need more information about my setup, and I will be glad to provide. Thanks!

simonpasquier commented 4 years ago

This seems to be an issue with Docker compose and/or engine rather than Alertmanager itself. You might want to try running with GODEBUG=netdns=1 (see golang docs).

trallnag commented 4 years ago

@simonpasquier, I think it's just an issue with the docker compose file @sergiodj is using. @sergiodj, try to put your services in a dedicated docker network. See here for an example:

https://github.com/trallnag/prometheus-adaptive-cards/tree/master/system-tests/docker-compose

sergiodj commented 4 years ago

Thanks for the invaluable help, @simonpasquier and @trallnag. As it turns out, the alertmanager image I'm using was changed by one of my colleagues and it was looking at the wrong path for the configuration file. Sorry about the noise; I'm closing the issue now.

neumond commented 2 months ago

This specific string dial tcp 127.0.0.1:5001: connect: connection refused most likely means failure to mount actual config file. Default config contains a webhook to url: 'http://127.0.0.1:5001/'. It's nothing about the network.