Closed bartr closed 3 years ago
@bartr I'm a bit confused on the vision for this task. Is it to build a custom pod which queries Prometheus and passes those metrics to FluentBit? By default, FluentBit is only listening for JSON messages over TCP, but has no way of reaching out and querying Prometheus
Based on a discussion in standup, this story is regarding an Azure specific monitoring stack for the NGSA deployment.
Need to:
/spikes/oms-log-analytics
to see the work that has been doneThe end goal should be three fold:
cc: @jkeane, @atxryan, @gled4er
End goal:
OMS Agent is the backing technology for Azure Monitor for Containers, which is the recommended monitoring system in the AKS Secure Baseline documentation: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/secure-baseline-aks#monitor-and-collect-metrics
Azure Monitor for Containers can be turned on through the portal, az cli, ARM templates or Terraform. Used in this manor it comes pre-configured with a host of out of the box dashboards, workbooks and alerts for infrastructure and cluster health monitoring.
So, we will be taking a dependency on using AKS, and having Azure Monitor for Containers installed through this mechanism. In order to enable the collection of application logs and metrics, we will need to apply a container-azm-ms-agentconfig.yaml
file to the cluster. For examples please see spikes/oms-log-analytics/3-container-azm-ms-agentconfig.yaml
and the example in the CAF Terraform deployment of the secure baseline: https://github.com/Azure/caf-terraform-landingzones-starter/blob/starter/enterprise_scale/construction_sets/aks/online/aks_secure_baseline/cluster-baseline-settings/container-azm-ms-agentconfig.yaml
We will use Flux to deploy the configuration files to the cluster's, so we will have two yaml files to enable an easy way to track and update the configuration. We will track this through two files: the first in gitops/deployments/dev
and the other in gitops/deployments/preprod/common
.
Implementation based on these learnings is in story #628
Description
What:
Why:
When:
Where:
Tasks
Acceptance Criteria
Constraints
References: