xmatters / xm-labs-prometheus

Labs integration for Prometheus AlertManager
MIT License
1 stars 4 forks source link

My version of AlertManager is sending far different json to matters than is presumed from the current instructions #4

Open bheadlee opened 5 years ago

bheadlee commented 5 years ago

Hi was wondering if anyone knows if there was any major change in what would get sent by AlertManager. I followed the documentation and imported the prometheus plan from this GitHub project but what gets sent from AlertManager is vastly different (Using Prom 2.12.0 and AlertMgr 0.19.0. The docs didn't suggest that any real additions to fields had to be done to at least get the base to work. Using the curl example from xMatters allows me to create alerts just fine.

This is what I'm seeing sent from AlertManager by intercepting with request bin. I can make this work by essentially re-writing the integration from scratch but I'm curious if I'm just missing something here?

Received from AlertManager:

{ "receiver": "xmatters", "status": "firing", "alerts": [ { "status": "firing", "labels": { "alertname": "InstanceUp", "instance": "localhost:9090", "job": "prometheus", "service": "Check Prometheus" }, "annotations": { "description": "The Node exporter service on the Prometheus server is running... ", "summary": "The test node has come up" }, "startsAt": "2019-09-11T17:08:45.277203128-05:00", "endsAt": "0001-01-01T00:00:00Z", "generatorURL": "http://fermi:9090/graph?g0.expr=up+%3D%3D+1&g0.tab=1", "fingerprint": "ad34903b6ade0da2" } ], "groupLabels": { "alertname": "InstanceUp", "instance": "localhost:9090", "job": "prometheus", "service": "Check Prometheus" }, "commonLabels": { "alertname": "InstanceUp", "instance": "localhost:9090", "job": "prometheus", "service": "Check Prometheus" }, "commonAnnotations": { "description": "The Node exporter service on the Prometheus server is running... ", "summary": "The test node has come up" }, "externalURL": "http://fermi:9093", "version": "4", "groupKey": "{}:{alertname=\"InstanceUp\", instance=\"localhost:9090\", job=\"prometheus\", service=\"Check Prometheus\"}" }

When that gets sent to matters, I get a code 400 error and it does not show up as an alert.

xMTinkerer commented 5 years ago

Hrmm, that's interesting. They might have updated the payload sent out. I threw that snippet into my text editor and it looks like it isn't valid JSON. The groupKey element has double quotes in it, so I'm guessing the JSON.parse is getting screwed up.

I'm not 100% familiar with Alertmanager, do you know where that value comes from? If there isn't a way to change it in alert manager, you might be able to do some regEx on the request.body, before the JSON.parse in the integration builder. It would be a little tricky though.

bheadlee commented 5 years ago

It is likely that the requestBin service is parsing out the escape characters from embeded quotes. I was able to take exactly what AlertManager was sending and pass it in using CURL and got a bit more feedback. THe problem is it doesn't make sense. The return message is below. After I got that response, I tried to just remove the "receiver" field since it was complaining about it and then it complained about the next field (status) and so on. After looking at the integration translation script, the fields make more sense but I don't know what to make of this error. None of the "11 known properties" are something that AlertManager would normally pass. I also validated the json structure with jsonLINT and it seems valid. Any ideas where these values come from ?


curl -1 -H "Content-Type: application/json" --user username:password -X POST -d '{"receiver":"xmatters","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"InstanceUp","instance":"localhost:9090","job":"prometheus","service":"Check Prometheus"},"annotations":{"description":"The Node exporter service on the Prometheus server is running... ","summary":"The test node has come up"},"startsAt":"2019-09-12T17:07:45.277203128-05:00","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://fermi:9090/graph?g0.expr=up+%3D%3D+1\u0026g0.tab=1","fingerprint":"ad34903b6ade0da2"}],"groupLabels":{"instance":"localhost:9090"},"commonLabels":{"alertname":"InstanceUp","instance":"localhost:9090","job":"prometheus","service":"Check Prometheus"},"commonAnnotations":{"description":"The Node exporter service on the Prometheus server is running... ","summary":"The test node has come up"},"externalURL":"http://fermi:9093","version":"4","groupKey":"localhost"}' "https://bheadlee.xmatters.com/reapi/2015-04-01/forms/89f69898-6bed-418a-b141-4c6592b41fc4/triggers"

{"type":"DATA_FORMAT_ERROR","message":"Request data is malformed and cannot be parsed.","errorDetails":[{"jsonPath":"$","details":"com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field \"receiver\" (class com.xmatters.rest.createevent.CreateEventForm), not marked as ignorable (11 known properties: \"responses\", \"properties\", \"priority\", \"attachmentToken\", \"conferences\", \"recipients\", \"scenarioId\", \"filterGroups\", \"callbacks\", \"integrationUUID\", \"requestId\"])\n at [Source: (ByteArrayInputStream); line: 1, column: 14] (through reference chain: com.xmatters.rest.createevent.CreateEventForm[\"receiver\"])"}]}

xMTinkerer commented 5 years ago

Ok, the groupKey value there looks fine. But that /reapi/2015 indicates you're sending the request to the older REAPI, which is still functional, but has a different payload format than the XMAPI. The install steps here point you to the url found in the inbound integration, on the integration builder tab. This page:

image

Using the inbound integration url will send it through the integration builder, which will trigger the inbound integration script. This means your curl request will return a requestId like so:


travisdepuy@appletree:~/xCode/octoapp$ curl -1 -H "Content-Type: application/json" -X POST -d '{"receiver":"xmatters","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"InstanceUp","instance":"localhost:9090","job":"prometheus","service":"Check Prometheus"},"annotations":{"description":"The Node exporter service on the Prometheus server is running... ","summary":"The test node has come up"},"startsAt":"2019-09-12T17:07:45.277203128-05:00","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://fermi:9090/graph?g0.expr=up+%3D%3D+1\u0026g0.tab=1","fingerprint":"ad34903b6ade0da2"}],"groupLabels":{"instance":"localhost:9090"},"commonLabels":{"alertname":"InstanceUp","instance":"localhost:9090","job":"prometheus","service":"Check Prometheus"},"commonAnnotations":{"description":"The Node exporter service on the Prometheus server is running... ","summary":"The test node has come up"},"externalURL":"http://fermi:9093","version":"4","groupKey":"localhost"}' "https://mandalore.cs1.xmatters.com/api/integration/1/functions/UUIDHERE/triggers?apiKey=APIKEYHERE"

{"requestId":"46a431bb-6487-4bb4-a7d9-422bb45cdd88"}

I do realize those steps could use a little help in the screen shot department.

xMTinkerer commented 5 years ago

If you'd rather, we can jump on a webex and we can work through it. I'd be interested in your feedback in all the things so we can smooth this installation process. Drop me an email at tdepuy [a] xmatters.com if you're interested.

xMTinkerer commented 5 years ago

@bheadlee did you ever get this sorted? Anything I can do to help? If you'd rather we can get on a webex and work through it. Happy Wednesday!