retaildevcrews / ngsa

Next Generation Symmetric Apps
MIT License
5 stars 7 forks source link

DESIGN: OMS agent for App Metrics #511

Closed bartr closed 3 years ago

bartr commented 3 years ago

Description

What:

Why:

When:

Where:

Tasks

Acceptance Criteria

Constraints

References:

jomalsan commented 3 years ago

@bartr I'm a bit confused on the vision for this task. Is it to build a custom pod which queries Prometheus and passes those metrics to FluentBit? By default, FluentBit is only listening for JSON messages over TCP, but has no way of reaching out and querying Prometheus

jomalsan commented 3 years ago

Based on a discussion in standup, this story is regarding an Azure specific monitoring stack for the NGSA deployment.

Need to:

The end goal should be three fold:

cc: @jkeane, @atxryan, @gled4er

jomalsan commented 3 years ago

End goal:

jomalsan commented 3 years ago

OMS Agent is the backing technology for Azure Monitor for Containers, which is the recommended monitoring system in the AKS Secure Baseline documentation: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/secure-baseline-aks#monitor-and-collect-metrics

Azure Monitor for Containers can be turned on through the portal, az cli, ARM templates or Terraform. Used in this manor it comes pre-configured with a host of out of the box dashboards, workbooks and alerts for infrastructure and cluster health monitoring.

So, we will be taking a dependency on using AKS, and having Azure Monitor for Containers installed through this mechanism. In order to enable the collection of application logs and metrics, we will need to apply a container-azm-ms-agentconfig.yaml file to the cluster. For examples please see spikes/oms-log-analytics/3-container-azm-ms-agentconfig.yaml and the example in the CAF Terraform deployment of the secure baseline: https://github.com/Azure/caf-terraform-landingzones-starter/blob/starter/enterprise_scale/construction_sets/aks/online/aks_secure_baseline/cluster-baseline-settings/container-azm-ms-agentconfig.yaml

We will use Flux to deploy the configuration files to the cluster's, so we will have two yaml files to enable an easy way to track and update the configuration. We will track this through two files: the first in gitops/deployments/dev and the other in gitops/deployments/preprod/common.

jomalsan commented 3 years ago

Implementation based on these learnings is in story #628