sylr / alertmanager-splunkbot

alertmanager alerts to splunk
6 stars 7 forks source link

Bad Request 400 from splunkbot service in K8s #5

Open vinitmasaun opened 3 years ago

vinitmasaun commented 3 years ago

I recently upgraded to the latest prometheus-community https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack and the alertmanager webhook is unable to post to the splunkbot service. Following is the message reported in the alertmanger pod: level=error ts=2020-12-01T19:04:34.242Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="splunk_webhook/webhook[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 400: http://alertmanager-splunkbot-service:44553"

Following is the config defined in the values.yaml for the splunk_webhook which works in the deprecated prometheus-operator

config: global: resolve_timeout: 5m route: group_by: [alertname] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'splunk_webhook' routes:

sylr commented 3 years ago

Geez, I did not know people used this thing.

Do you have logs coming from the splunkbot itself ?

vinitmasaun commented 3 years ago

Yes, the splunkbot is generating logs and below is a snippet. I don’t see any error messages in the splunkbot logs:

time="2020-12-01T20:19:35Z" level=debug msg="End of request" time="2020-12-01T20:19:35Z" level=debug msg="End of request" time="2020-12-01T20:23:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[Content-Type:[application/json] User-Agent:[Alertmanager/0.21.0] Content-Length:[1418]] 0xc42013a240 1418 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:46962 / 0xc42013a280}" time="2020-12-01T20:23:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1436] Content-Type:[application/json]] 0xc420280140 1436 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53742 / 0xc420280180}" time="2020-12-01T20:23:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1480] Content-Type:[application/json]] 0xc420280300 1480 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53740 / 0xc420280340}" time="2020-12-01T20:23:37Z" level=debug msg="End of request" time="2020-12-01T20:23:37Z" level=debug msg="End of request" time="2020-12-01T20:23:38Z" level=debug msg="End of request" time="2020-12-01T20:24:34Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1691] Content-Type:[application/json]] 0xc42013af40 1691 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:46962 / 0xc42013af80}" time="2020-12-01T20:24:34Z" level=debug msg="End of request" time="2020-12-01T20:24:35Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[2210] Content-Type:[application/json]] 0xc4202f88c0 2210 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:46962 / 0xc4202f8900}" time="2020-12-01T20:24:35Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[5814] Content-Type:[application/json]] 0xc4202f8ac0 5814 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53742 / 0xc4202f8b40}" time="2020-12-01T20:24:35Z" level=debug msg="End of request" time="2020-12-01T20:24:35Z" level=debug msg="End of request" time="2020-12-01T20:28:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1418] Content-Type:[application/json]] 0xc4201a4280 1418 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53742 / 0xc4201a42c0}" time="2020-12-01T20:28:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1436] Content-Type:[application/json]] 0xc420280240 1436 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:46962 / 0xc420280280}" time="2020-12-01T20:28:37Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[Content-Length:[1480] Content-Type:[application/json] User-Agent:[Alertmanager/0.21.0]] 0xc42013a100 1480 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53740 / 0xc42013a200}" time="2020-12-01T20:28:37Z" level=debug msg="End of request" time="2020-12-01T20:28:37Z" level=debug msg="End of request" time="2020-12-01T20:28:37Z" level=debug msg="End of request" time="2020-12-01T20:29:34Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[1691] Content-Type:[application/json]] 0xc420280840 1691 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53742 / 0xc420280880}" time="2020-12-01T20:29:34Z" level=debug msg="End of request" time="2020-12-01T20:29:35Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[Content-Length:[2210] Content-Type:[application/json] User-Agent:[Alertmanager/0.21.0]] 0xc42013a700 2210 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53742 / 0xc42013a7c0}" time="2020-12-01T20:29:35Z" level=debug msg="New request: &{POST / HTTP/1.1 1 1 map[User-Agent:[Alertmanager/0.21.0] Content-Length:[5814] Content-Type:[application/json]] 0xc4202f8080 5814 [] false alertmanager-splunkbot-service:44553 map[] map[] map[] 10.162.46.235:53740 / 0xc4202f80c0}" time="2020-12-01T20:29:35Z" level=debug msg="End of request" time="2020-12-01T20:29:36Z" level=debug msg="End of request"

From: Sylvain Rabot notifications@github.com Date: Tuesday, December 1, 2020 at 3:30 PM To: sylr/alertmanager-splunkbot alertmanager-splunkbot@noreply.github.com Cc: Author author@noreply.github.com Subject: Re: [sylr/alertmanager-splunkbot] Bad Request 400 from splunkbot service in K8s (#5) EXTERNAL MESSAGE

Geez, I did not know people used this thing.

Do you have logs coming from the splunkbot itself ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/sylr/alertmanager-splunkbot/issues/5#issuecomment-736800708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGNC5H7MH33ZSIQXJ3C6WFTSSVG4JANCNFSM4UJOMQKQ.


The information contained in this message is intended only for the recipient, and may be a confidential attorney-client communication or may otherwise be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, please be aware that any dissemination or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the message and deleting it from your computer. S&P Global Inc. reserves the right, subject to applicable local law, to monitor, review and process the content of any electronic message or information sent to or from S&P Global Inc. e-mail addresses without informing the sender or recipient of the message. By sending electronic message or information to S&P Global Inc. e-mail addresses you, as the sender, are consenting to S&P Global Inc. processing any of your personal data therein.

sylr commented 3 years ago

Can you try running: quay.io/sylr/alertmanager-splunkbot:latest

vinitmasaun commented 3 years ago

Can you try running: quay.io/sylr/alertmanager-splunkbot:latest

Got the same issue with the latest tag as well.

sylr commented 3 years ago

Yeah I only improved the error logging.

Did you see anything interesting in the logs?

vinitmasaun commented 3 years ago

Nothing new. Its the same messages that I posted initially. The alertmanger gets a 400 response from the splunkbot service. If I revert back to stable/prometheus-operator helm chart then it works but we need to move away from stable/prometheus-operator due to security vulnerabilities.

sylr commented 3 years ago

You should see log lines with Splunk response status code in debug mode.

vinitmasaun commented 3 years ago

Also, there are security vulnerabilities that have been flagged for this helm chart as well. The security scan is done by twistlock.

CVE-2019-13115 46 OS high libssh2   1.8.0-r2 BSD 8.1 fixed in 1.9.0-r0
CVE-2019-17498 46 OS high libssh2   1.8.0-r2 BSD 8.1 fixed in 1.9.0-r1
CVE-2019-3855 46 OS high libssh2   1.8.0-r2 BSD 8.8 fixed in 1.8.1-r0
CVE-2019-3856 46 OS high libssh2   1.8.0-r2 BSD 8.8 fixed in 1.8.1-r0
CVE-2019-3857 46 OS high libssh2   1.8.0-r2 BSD 8.8 fixed in 1.8.1-r0
CVE-2019-3858 46 OS critical libssh2   1.8.0-r2 BSD 9.1 fixed in 1.8.1-r0
CVE-2019-3859 46 OS critical libssh2   1.8.0-r2 BSD 9.1 fixed in 1.8.1-r0
CVE-2019-3860 46 OS critical libssh2   1.8.0-r2 BSD 9.1 fixed in 1.8.1-r0
CVE-2019-3861 46 OS critical libssh2   1.8.0-r2 BSD 9.1 fixed in 1.8.1-r0
CVE-2019-3862 46 OS critical libssh2   1.8.0-r2 BSD 9.1 fixed in 1.8.1-r0
CVE-2019-3863 46 OS high libssh2   1.8.0-r2 BSD 8.8 fixed in 1.8.1-r0
CVE-2019-18276 46 OS high bash   4.4.19-r1 GPL3+ 7.8 fixed in 5.0.11-r1
CVE-2018-1000500 46 OS high busybox   1.27.2-r8 GPL2 8.1  
CVE-2018-1000517 46 OS critical busybox   1.27.2-r8 GPL2 9.8 fixed in 1.29.3-r10
CVE-2018-20679 46 OS high busybox   1.27.2-r8 GPL2 7.5 fixed in 1.30.1-r4
CVE-2019-5747 46 OS high busybox   1.27.2-r8 GPL2 7.5 fixed in 1.30.1-r4
CVE-2019-14697 46 OS critical musl-utils,musl musl 1.1.18-r3 MIT BSD GPL2+ 9.8 fixed in 1.1.18-r4
CVE-2018-0495 46 OS medium libressl2.6-libssl,libressl2.6-libcrypto libressl 2.6.3-r0 custom 4.7 fixed in 2.6.5-r0
CVE-2018-0732 46 OS high libressl2.6-libssl,libressl2.6-libcrypto libressl 2.6.3-r0 custom 7.5 fixed in 2.6.5-r0
CVE-2018-12434 46 OS medium libressl2.6-libssl,libressl2.6-libcrypto libressl 2.6.3-r0 custom 4.7 fixed in 2.7.5-r0
CVE-2018-0500 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.0-r0
CVE-2018-1000300 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.60.0-r0
CVE-2018-1000301 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.1 fixed in 7.60.0-r0
CVE-2018-14618 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r0
CVE-2018-16839 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r1
CVE-2018-16840 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r1
CVE-2018-16842 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.1 fixed in 7.61.1-r1
CVE-2018-16890 46 OS high libcurl,curl curl 7.59.0-r0 MIT 7.5 fixed in 7.61.1-r2
CVE-2019-3822 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r2
CVE-2019-3823 46 OS high libcurl,curl curl 7.59.0-r0 MIT 7.5 fixed in 7.61.1-r2
CVE-2019-5481 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r3
CVE-2019-5482 46 OS critical libcurl,curl curl 7.59.0-r0 MIT 9.8 fixed in 7.61.1-r3
CVE-2018-10754 46 OS low ncurses-libs,ncurses-terminfo,ncurses-terminfo-base ncurses 6.0_p20171125-r0 MIT 0 fixed in 6.0_p20171125-r1
CVE-2019-17594 46 OS medium ncurses-libs,ncurses-terminfo,ncurses-terminfo-base ncurses 6.0_p20171125-r0 MIT 5.3 fixed in 6.2_p20200523-r0
CVE-2019-17595 46 OS medium ncurses-libs,ncurses-terminfo,ncurses-terminfo-base ncurses 6.0_p20171125-r0 MIT 5.4 fixed in 6.2_p20200523-r0
  41 CIS high         0  
sylr commented 3 years ago

These CVE do not concern the alertmanager-splunkbot as the Docker image is built from the scratch image.

vinitmasaun commented 3 years ago

You should see log lines with Splunk response status code in debug mode.

How do I turn on debug mode for this chart?

vinitmasaun commented 3 years ago

These CVE do not concern the alertmanager-splunkbot as the Docker image is built from the scratch image.

I am not building my own docker image. The image being pulled is the one thats defined in this chart:

sylr/alertmanager-splunkbot:latest

sylr commented 3 years ago

You should see log lines with Splunk response status code in debug mode.

How do I turn on debug mode for this chart?

I don't know I don't provide the chart.

These CVE do not concern the alertmanager-splunkbot as the Docker image is built from the scratch image.

I am not building my own docker image. The image being pulled is the one thats defined in this chart:

sylr/alertmanager-splunkbot:latest

Please use quay.io/sylr/alertmanager-splunkbot:latest not sylr/alertmanager-splunkbot:latest.