As part of the operational awareness workshop was the case "3rd party endpoint down" as potential risk identified. Beside having retry-mechanism implemented, we agreed to setup also lightweight monitoring for these endpoints.
Goals is to detect outages of 3rd pary endpoints which will impact also the availability of the ACM. 3rd Party services are:
Compass Director
Compass Connector
We should verify the availability of these endpoints and report them as part of our metrics. Beside the availability, also the amount of successful / vailed HTTP requests to these endpoints should become part of the ACM metrics.
AC:
[ ] Add other metric for measuring the availability of the Compass Director endpoint
[ ] Add other metric for measuring the availability of the Compass Connector endpoint
[ ] Ensure we are exposing in our metrics also the amount of successful / failed calls to the endpoint of Compass Director
[ ] Ensure we are exposing in our metrics also the amount of successful / failed calls to the endpoint of Compass Connector
Reason
Improve operational readiness of ACM by enhancing our monitoring to detect impacts caused by 3rd party applications.
Description
As part of the operational awareness workshop was the case "3rd party endpoint down" as potential risk identified. Beside having retry-mechanism implemented, we agreed to setup also lightweight monitoring for these endpoints.
Goals is to detect outages of 3rd pary endpoints which will impact also the availability of the ACM. 3rd Party services are:
We should verify the availability of these endpoints and report them as part of our metrics. Beside the availability, also the amount of successful / vailed HTTP requests to these endpoints should become part of the ACM metrics.
AC:
Reason Improve operational readiness of ACM by enhancing our monitoring to detect impacts caused by 3rd party applications.