mcci-catena / docker-iot-dashboard

A complete IoT server for LoRaWAN IoT projects: node-red + influxdb + grafana + ssl + let's encrypt using docker-compose.
MIT License
101 stars 60 forks source link

[postfix] not sending Grafana alerts #26

Closed oliv3 closed 5 years ago

oliv3 commented 5 years ago

Hi, can someone help me troubleshoot this issue ? I'm running latest code from master, everything works fine but sending mail. I set up a Grafana channel to send alerts to my address, when trying to send a test message from Grafana I don't receive anything. (Nothing reaching my mail server) I'm not a Postfix expert, and troubleshooting Postfix running in a container is even harder..

Here's the .env file I'm using (replacing my domain with foobar.net):

TTN_DASHBOARD_DATA=/ttn/
TTN_DASHBOARD_APACHE_FQDN=ttn.foobar.net
TTN_DASHBOARD_CERTBOT_FQDN=ttn.foobar.net
TTN_DASHBOARD_CERTBOT_EMAIL=someother@mail.com
TTN_DASHBOARD_GRAFANA_ADMIN_PASSWORD=xxxxxxxxx
TTN_DASHBOARD_GRAFANA_SMTP_FROM_ADDRESS=grafana@ttn.foobar.net
TTN_DASHBOARD_GRAFANA_INSTALL_PLUGINS=grafana-worldmap-panel,grafana-clock-panel,grafana-piechart-panel
TTN_DASHBOARD_INFLUXDB_INITIAL_DATABASE_NAME=demo
TTN_DASHBOARD_MAIL_HOST_NAME=ttn.foobar.net
TTN_DASHBOARD_MAIL_DOMAIN=foobar.net
TTN_DASHBOARD_MYSQL_PASSWORD=xxxxxxxxx
TTN_DASHBOARD_MYSQL_ROOT_PASSWORD=xxxxxxxxx

Going into the container with some

# docker exec -it ttn_dashboard_postfix_1_d8c37ddb4aa0 bash
root@d97675a38e1b:/# postqueue -p

(...)
C568C138AB2     1384 Wed Jan 23 20:08:44  MAILER-DAEMON
                                                (unknown mail transport error)
                                         olivier@biniou.info

But here stop my postfix skills... I have no clue how to debug this. Sending a mail to myself from inside the container using the mail command doesn't work either, so I think it's probably Postfix going wrong rather than Grafana..

Thanks in advance,

terrillmoore commented 5 years ago

@MuruganChandrasekar filed #11 which is supposed to address postfix (in general). Unfortunately, I wanted to improve the error handling. You might look at the changes in #11 and see if that helps. Meanwhile, I hope to finish rebasing that patch and applying it... "soon"....

oliv3 commented 5 years ago

Here's what I get in the docker-compose output:

grafana_1_e9d04bb0896d | t=2019-01-23T22:43:13+0000 lvl=info msg="Sending alert notification to" logger=alerting.notifier.email addresses=[olivier@biniou.info]
grafana_1_e9d04bb0896d | t=2019-01-23T22:43:13+0000 lvl=eror msg="Failed to send alert notification email" logger=alerting.notifier.email error="SMTP not configured, check your grafana.ini config file's [smtp] section."
grafana_1_e9d04bb0896d | t=2019-01-23T22:43:13+0000 lvl=eror msg="failed to send notification" logger=alerting.notifier id=0 error="SMTP not configured, check your grafana.ini config file's [smtp] section."

Grafana/latest is 5.4.3, so that's a more recent version than the one I got when using this project for the first time (IIRC it used at that time to successfully send mail alerts)

Still, doesn't explain why sending mail from the command line also fails..

oliv3 commented 5 years ago

Don't know if it's related, but hope it can be useful :)

root@d97675a38e1b:/# /etc/my_init.d/postfix.sh 
 * Stopping Postfix Mail Transport Agent postfix                                                                                                                                       [ OK ] 
 * Starting Postfix Mail Transport Agent postfix                                                                                                                                              
postfix: Postfix is running with backwards-compatible default settings
postfix: See http://www.postfix.org/COMPATIBILITY_README.html for details
postfix: To disable backwards compatibility use "postconf compatibility_level=2" and "postfix reload"
postfix/postfix-script: fatal: the Postfix mail system is already running
                                                                                                                                                                                       [fail]
oliv3 commented 5 years ago

@terrillmoore OK hold on, found two issues, will report here for analysis.

MuruganChandrasekar commented 5 years ago

@MuruganChandrasekar filed #11 which is supposed to address postfix (in general). Unfortunately, I wanted to improve the error handling. You might look at the changes in #11 and see if that helps. Meanwhile, I hope to finish rebasing that patch and applying it... "soon"....

I'm looking into this and I'll update you when it is done.

oliv3 commented 5 years ago

So we have two issues here:

  1. Communication failure between Grafana and Postfix:

Using default configuration, sending a test alert from Grafana fails:

Setting TTN_DASHBOARD_GRAFANA_SMTP_SKIP_VERIFY=true makes things a little better, Grafana doesn't complain anymore: grafana_1_1bd29478925c | t=2019-01-24T16:02:15+0000 lvl=info msg="Sending alert notification to" logger=alerting.notifier.email addresses=[olivier@biniou.info]. Still, no mails get sent.

  1. Postfix configuration: https://github.com/mcci-catena/docker-ttn-dashboard/blob/6cb5d4f9a3cc40449fa950e5b78b1b3b6602858a/docker-compose.yml#L162

TTN_DASHBOARD_MAIL_RELAY_IP was also not set, so getting "." as default, thus resulting in mail not being sent, as we can see in Postfix's /var/log/mail.log:

Jan 24 16:20:58 132788baa5cf postfix/smtpd[168]: connect from unknown[172.18.0.1]
Jan 24 16:20:58 132788baa5cf postfix/smtpd[168]: 57AE5196A38: client=unknown[172.18.0.1]
Jan 24 16:20:58 132788baa5cf postfix/cleanup[172]: 57AE5196A38: message-id=<>
Jan 24 16:20:58 132788baa5cf postfix/qmgr[152]: 57AE5196A38: from=<grafana@kaalut.dogfooding.net>, size=34621, nrcpt=1 (queue active)
Jan 24 16:20:58 132788baa5cf postfix/smtpd[168]: disconnect from unknown[172.18.0.1] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Jan 24 16:20:58 132788baa5cf postfix/smtp[173]: fatal: valid hostname or network address required in server description: .
Jan 24 16:20:59 132788baa5cf postfix/qmgr[152]: warning: private/smtp socket: malformed response
Jan 24 16:20:59 132788baa5cf postfix/qmgr[152]: warning: transport smtp failure -- see a previous warning/fatal/panic logfile record for the problem description
Jan 24 16:20:59 132788baa5cf postfix/master[145]: warning: process /usr/lib/postfix/sbin/smtp pid 173 exit status 1
Jan 24 16:20:59 132788baa5cf postfix/master[145]: warning: /usr/lib/postfix/sbin/smtp: bad command startup -- throttling
Jan 24 16:20:59 132788baa5cf postfix/error[174]: 57AE5196A38: to=<olivier@biniou.info>, relay=none, delay=1.2, delays=0.03/1.1/0/0.12, dsn=4.3.0, status=deferred (unknown mail transport error)

Changing relay_ip in docker-compose.yml solved the issue:

-        relay_ip: "${TTN_DASHBOARD_MAIL_RELAY_IP:-.}"
+        relay_ip: "${TTN_DASHBOARD_MAIL_RELAY_IP:-}"

Please bear in mind that I'm in no way a Postfix expert, but things look OK now, at least I receive the test alerts from Grafana.

I'll gladly send a PR for 2., for 1. I don't know. Maybe if there's some fix regarding certificates that can be done, then Grafana could use SSL when talking to Postfix, or the default value of GF_SMTP_SKIP_VERIFY could be changed. Your call.

I'd say that documentation would need an update, but you should write this better than I, since English is not my mother tongue.

terrillmoore commented 5 years ago

Thanks for the great research. PRs would be much appreciated.

Regarding point 1, I'm not 100% sure I understand the detailed problem, but: since Grafana <> Postfix is a direct (local) connection, there's no need to use SSL -- they can trust each other, that's an important feature of Docker Compose. That's also why Grafana isn't using SSL for the client-facing connections, it trusts Apache, who then does the SSL for everybody.

oliv3 commented 5 years ago

Agreed, then GF_SMTP_HOST should default to postfix, not localhost, right ?

oliv3 commented 5 years ago

Or not. It's exposed.

terrillmoore commented 5 years ago

It shouldn't be exposed; it's a docker-compose local network. Grafana should be using host address 'postfix', which will be found in /etc/hosts on the grafana container. There should be a network connection on port 25 available between Grafana and Postfix set up by the Docker Compose file.

Postfix should not be exposing anything to the outside world unless for some reason you want incoming mail to this server. I don't recommend that. Because I got stuck with the error handling in #11, I didn't have a chance to get things working and then inspect things from the point of view of security.

oliv3 commented 5 years ago

Definitely agreed, but that's what is done at the moment. Can fix this while I'm at it.

terrillmoore commented 5 years ago

Can fix this while I'm at it.

That would be great.

oliv3 commented 5 years ago

Postfix has TLS enabled by default (which is a good thing). From /etc/postfix/main.cf: smtpd_use_tls=yes

So Grafana will use TLS (unless GF_SMTP_SKIP_VERIFY is set to true)

Will have to figure out how to disable this in Postfix, probably not so hard to do, I hope.

oliv3 commented 5 years ago

In postfix/Dockerfile, adding

+run postconf -e smtpd_use_tls=no

seems to do the trick.