hotosm / tasking-manager

Tasking Manager - The tool to team up for mapping in OpenStreetMap
https://wiki.openstreetmap.org/wiki/Tasking_Manager
BSD 2-Clause "Simplified" License
496 stars 267 forks source link

[BUG] Opsgenie webhook is failing to notify when builds fail #5492

Open dakotabenjamin opened 1 year ago

dakotabenjamin commented 1 year ago

Describe the bug We use OpsGenie to message to the devops team when things go wrong. Connecting CircleCI to OpsGenie can help us immediately identify when builds fail and rectify them quickly. At present, we have it setup with a webhook, but that webhook is not working correctly:

https://github.com/hotosm/tasking-manager/blob/develop/.circleci/config.yml#L347

notify:
  webhooks:
    - url: https://api.opsgenie.com/v1/json/circleci?apiKey=$OPSGENIE_API

image

The challenge is that OpsGenie integration with CircleCI is not well supported. I made a bug report a while back but it was closed without resolution. Someone made a comment with a workaround that may work for us.

There may be other alternatives to getting notifications to Slack that we should explore as well.

eternaltyro commented 1 year ago

One issue I noticed is that the $ in $OPSGENIE_API is simply URL encoded rather than the environment variable being replaced with the value. I am raising a PR to fix this in the interim. This does not solve the issue, but merely removes the error message ( HTTP 402) from being displayed.

eternaltyro commented 1 year ago

This turned out to be unhelpful. I made the envvar string of the form ${OPSGENIE_API}. And it still URL encodes the string in full without substituting it for the actual value. I'm going to pause working on this to focus on other stuff.

@AfiMaameDufie Can you please take over? My suggestion is to use an empty repository and make simple jobs like sleep 5 to test OpsGenie integration. You can still continue to use this ticket to track your progress.

workflow:
    wf-A:
        - short_success
        - long_success

     wf-B:
        - short_failure
        - long_failure

      wf-C
         - short_success
         - long_failure

  jobs:
      short_success:
          - command: sleep 5

      long_success:
           - command: sleep 50

      short_failure:
          - command: |
                 sleep 5
                 exit 1

      long_failure:
          - command: |
                 sleep 50
                 exit 1

You can use the above pseudocode as a rough template to build your custom workflow and job without the complexities of having Tasking Manager stuff in it. You can try inserting notification steps in the jobs as recommended by OpsGenie engineers in their response to our ticket.

AfiMaameDufie commented 1 year ago

Okay @eternaltyro Will do that and let you know the outcome.

AfiMaameDufie commented 1 year ago

This has been tested with the Opsgenie orb for the different cases above : https://app.circleci.com/pipelines/github/hotosm/OpsGenie_Webhook

Alerts were created for failed workflows on Opsgenie