meln5674 / grafana-mongodb-community-plugin

Open Source Grafana Plugin for querying MongoDB
GNU Affero General Public License v3.0
137 stars 18 forks source link

Add alerting to the plugin #23

Open rajeshggwp opened 1 year ago

rajeshggwp commented 1 year ago

Hi @meln5674, I added this configuration and I was able to locally test the alert configuration. Let me know your thoughts! Fixes: #14

meln5674 commented 1 year ago

I've not actually used the alerting feature before, would you mind adding something to the end-to-end tests based on your local testing?

rajeshggwp commented 1 year ago

For the integration test, I think I can create an alert rule - https://grafana.com/docs/grafana/latest/developers/http_api/alerting_provisioning/#route-post-alert-rule

Also, CI / ci (pull_request) had failed. Can you help me out with this?

meln5674 commented 1 year ago

Yes, I was looking at that this morning. I believe the cause is due to some flaky-ness regarding the different layouts the grafana UI will use depending on the screen size. I've pushed a commit that pins a fixed window size and the job passed, so I would recommend merging master and seeing if the failure happens again.

rajeshggwp commented 1 year ago

Hey @meln5674, I tried creating integration tests for creating alerts using grafana APIs(https://grafana.com/docs/grafana/latest/developers/http_api/). I tried both Alerting API (unstable) and Alerting Provisioning API. I was able to create alerts with and without my changes. UI does not allow me to create alerts without my changes though. What are your thoughts?

rajeshggwp commented 1 year ago

sample curl curl --location 'http://localhost:3000/api/v1/provisioning/alert-rules' \ --header 'X-Disable-Provenance: true' \ --header 'Content-Type: application/json' \ --data '{ "condition": "C", "execErrState": "Error", "noDataState": "NoData", "folderUID": "testFolder", "for": "5m", "orgID": 1, "ruleGroup": "testRule", "title": "12test", "data": [ { "refId": "A", "datasourceUid": "cc538a50-ec13-4684-9c1e-3c10f9783829", "queryType": "Timeseries", "relativeTimeRange": { "from": 21600, "to": 0 }, "model": { "refId": "A", "database": "test", "collection": "weather", "queryType": "Timeseries", "timestampField": "timestamp", "timestampFormat": "", "labelFields": [ "sensorID" ], "legendFormat": "", "valueFields": [ "temperature" ], "valueFieldTypes": [ "int32" ], "aggregation": "[{ \"$project\": { \"timestamp\": 1, \"sensorID\": \"$metadata.sensorId\", \"temperature\": \"$temp\" }}]", "autoTimeBound": true, "autoTimeSort": true, "schemaInference": false, "schemaInferenceDepth": 20, "intervalMs": 1000 } }, { "refId": "B", "datasourceUid": "__expr__", "queryType": "", "model": { "refId": "B", "type": "reduce", "datasource": { "uid": "__expr__", "type": "__expr__" }, "conditions": [ { "type": "query", "evaluator": { "params": [], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": [ "B" ] }, "reducer": { "params": [], "type": "last" } } ], "reducer": "last", "expression": "A" }, "relativeTimeRange": { "from": 21600, "to": 0 } }, { "refId": "C", "datasourceUid": "__expr__", "queryType": "", "model": { "refId": "C", "type": "threshold", "datasource": { "uid": "__expr__", "type": "__expr__" }, "conditions": [ { "type": "query", "evaluator": { "params": [ 0 ], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": [ "C" ] }, "reducer": { "params": [], "type": "last" } } ], "expression": "B" }, "relativeTimeRange": { "from": 21600, "to": 0 } } ] }'

this request works with existing code as well

meln5674 commented 1 year ago

Looks reasonable. Would it be possible to add a request as well that would demonstrate getting that alert's status (firing/not firing)? If you can add those two requests to the e2e test function I linked, I'll go ahead and merge.

rajeshggwp commented 1 year ago

Hey @meln5674, I have added the tests,

  1. Create a folder for storing alert rules
  2. Create an alert rule - I have added a rule on weather collection
  3. Check the evaluation status of the alert rules I have used the APIs from https://editor.swagger.io/?url=https://raw.githubusercontent.com/grafana/grafana/main/pkg/services/ngalert/api/tooling/post.json Let me know what you think of this approach. Also, looks like you made some changes in the integration tests, I need help updating the tests based on these new changes.
meln5674 commented 11 months ago

Sorry for the delay. I like the approach, but I have a few comments/recommendations:

  1. I think it'd be preferable to use the file provisioning rather than API provisioning for the alerts, while still using the API to check the alerts like you are now. One major reason for this is that the e2e tests also double as a dev/test environment, and running them a second time will fail because the alert and group already exist. Unfortunately, the chart that I'm using to deploy grafana doesn't support this, so I've switched it to a fork that should be able to handle it on master. Here's how I would go about this:
    • Start by exporting the alert rule to a file under integration-tests/alerts/
    • Add a new ConfigMap with the contents of the exported alert rule YAML, see here for how the dashboard and datasources are provisioned
    • Set that ConfigMap to be created here
    • Finally, have the grafana fork mount your provisioning file ConfigMap by setting alerting.configMapName here
  2. It looks like your example alert query is for the test.weather collection, but it seems to be querying test.tweets, which doesn't exist (twitter.tweets does, however).
  3. The tests should really check that the alert is firing from the response. To make this more consistent, I pushed a change to master where the test.weather collection's timestamps are relative to the start time of the database, so your tests can have a reliable "Last" value. Maybe also create a second alert that's expected to not be firing.
omri-shilton commented 8 months ago

@rajeshggwp any updates to this PR? we would love to use alerting with this plugin@

meln5674 commented 8 months ago

I haven't heard from the author in quite a while, and I unfortunately have not had time to work on this since.