Document how to use showcase with Azure Logging + Monitoring

christophe-f commented 1 year ago

What needs to be done?

On a previous commit by Tomas, we have observability in the showcase app for Openshift. We need to make sure that this is also working for Azure.

Zaperex commented 1 year ago

@christophe-f @kadel can I just confirm that the requirements for this issue are:

Deploy the showcase onto Azure Kubernetes Service (AKS) (preferably using the helm chart and disable routes?)
Test to see if Azure Monitor Logging analytics works for showcase deployed on AKS
See if the Prometheus operator serviceMonitor is able to ingest data from the /metrics endpoint for the showcase?
Document the process on setting all this up

The 3rd point is just my guess at what the "monitoring" part of this issue refers to, please correct me if my assumption is incorrect. Just need to get some confirmation before I start trying to deploy stuff.

christophe-f commented 1 year ago

Yes that's great

Zaperex commented 1 year ago

To deploy the helm chart onto AKS, the default configurations for the postgresql helm chart would fail. It is necessary for the user to configure the podSecurityContext with the postgres user and group in the postgres image.

postgresql:
  primary:
    podSecurityContext:
      enabled: true
      fsGroup: 26
      runAsUser: 26

Otherwise, the postgresql image will error out due to permission errors: mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

Zaperex commented 1 year ago

The ServiceMonitor from the upstream backstage chart utilizes the /metrics endpoint for it's metric collection.

However, I'm currently having difficulty accessing the /metrics endpoint in the showcase and RHDH image. This is probably due to the frontend and backend sharing the same baseUrl in the RHDH and Showcase images, which causes the /metrics endpoint to get sent to the frontend instead of the backend for some reason.

When execing into the backstage-backend container, and running curl on localhost:7007/metrics, you get the html code for the frontend instead of the expected metrics output:

sh-5.1$ curl localhost:7007/metrics
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><meta name="description" content="Janus Community Showcase"/><link rel="apple-touch-icon" href="/logo192.png"/><link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/><link rel="icon" href="/favicon.ico"/><link rel="shortcut icon" href="/favicon.ico"/><link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png"/><link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5"/><link rel="icon" href=""/><title>Janus IDP Backstage Showcase</title><script defer="defer" src="/static/runtime.01b40e37.js"></script><script defer="defer" src="/static/module-material-ui.acd80c17.js"></script><script defer="defer" src="/static/module-patternfly.347a13d4.js"></script><script defer="defer" src="/static/module-lodash.a37c6806.js"></script><script defer="defer" src="/static/module-lodash-es.7a60cfaf.js"></script><script defer="defer" src="/static/module-backstage.c39bf639.js"></script><script defer="defer" src="/static/module-moment.daa9610c.js"></script><script defer="defer" src="/static/module-mui.a2cae32b.js"></script><script defer="defer" src="/static/module-mathjs.4eb795f8.js"></script><script defer="defer" src="/static/module-date-fns.7572de26.js"></script><script defer="defer" src="/static/module-d3-zoom.d2314a8f.js"></script><script defer="defer" src="/static/module-rjsf.f0fcaf11.js"></script><script defer="defer" src="/static/module-segment.3a8f78f7.js"></script><script defer="defer" src="/static/module-yaml.3442c513.js"></script><script defer="defer" src="/static/module-ajv.16c80e32.js"></script><script defer="defer" src="/static/module-material-table.a8bd62e9.js"></script><script defer="defer" src="/static/module-fp-ts.774d9856.js"></script><script defer="defer" src="/static/module-roadiehq.28dbfc56.js"></script><script defer="defer" src="/static/module-postcss.c44b0d3e.js"></script><script defer="defer" src="/static/module-octokit.3ce4aa6a.js"></script><script defer="defer" src="/static/module-luxon.918e84f5.js"></script><script defer="defer" src="/static/module-react-beautiful-dnd.fb6a161d.js"></script><script defer="defer" src="/static/module-htmlparser2.19123908.js"></script><script defer="defer" src="/static/module-micromark-core-commonmark.ca51b777.js"></script><script defer="defer" src="/static/module-webcola.0897bbb5.js"></script><script defer="defer" src="/static/module-zod.e9b85b5a.js"></script><script defer="defer" src="/static/module-janus-idp.a1549e89.js"></script><script defer="defer" src="/static/module-react-grid-layout.47bdcacc.js"></script><script defer="defer" src="/static/module-io-ts.1bd938b2.js"></script><script defer="defer" src="/static/module-photoswipe.c7e60322.js"></script><script defer="defer" src="/static/module-react-dom.cf4bcd24.js"></script><script defer="defer" src="/static/module-js-yaml.0f6f362e.js"></script><script defer="defer" src="/static/module-decimal.js.fc955b43.js"></script><script defer="defer" src="/static/module-mobx.4824ce59.js"></script><script defer="defer" src="/static/module-remix-run.3c4f92dc.js"></script><script defer="defer" src="/static/vendor.01b40e37.js"></script><script defer="defer" src="/static/main.01b40e37.js"></script><link href="/static/vendor.2001c720.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

Zaperex commented 1 year ago

As for the Logging, the Azure Monitor Log Analytics works as intended and is mostly able to query the showcase logs.

However, it appears that currently backstage logs contain ANSI codes to provide colors in the terminal. The logs viewed in the azure container logs:

[2m2023-10-30T20:03:21.036Z[22m [34msearch[39m [32minfo[39m Collating documents for techdocs succeeded [36mtype[39m=plugin [36mdocumentType[39m=techdocs

The same logs viewed in the terminal:

This in turn interferes with the KQL queries when using the has keyword (which searches for exact matches), since the log level indicators like info , debug, warning, etc all have ANSI. ex: info shows up as [32minfo Example of a query that is affected:

ContainerLogV2
| where ContainerName == "backstage-backend"
| project TimeGenerated, LogMessage
| where LogMessage has "info"
| order by TimeGenerated desc 
| take 100

In the example query above, the expected output would have been all the logs with the info log level, however, pretty much none would actually appear.

Current workaround is using contains instead of has. However, that might capture logs with just partial matches which might not be ideal.

Zaperex commented 1 year ago

~Currently blocked by issue discussed in the comment above: https://github.com/janus-idp/backstage-showcase/issues/615#issuecomment-1787753027~ (https://github.com/janus-idp/backstage-showcase/issues/690 has been opened to address the issue discussed)

Zaperex commented 1 year ago

It seems the issue might be due to the order in which the routers are being registered. Currently the router for the app-backend-plugin (let's call it app) is being added to the express app of the ServiceBuilder first.

Express.js middleware (including routers) are executed in the order they are added. Since the app router, /metrics router and /healthcheck router all use the same root of "", then any request to /metrics and /healthcheck will be handled by the app router before it gets a chance to reach the /metrics or healthcheck router.

https://github.com/janus-idp/backstage-showcase/issues/690 was opened to capture this issue.

Zaperex commented 1 year ago

Currently blocked by https://github.com/janus-idp/backstage-showcase/issues/690

Zaperex commented 1 year ago

https://github.com/janus-idp/backstage-showcase/issues/690 has been resolved.

Update on metrics. Was able to expose the /metrics endpoint to the Azure Managed Prometheus instance using pod annotations. In the helm chart, the users would need to add the following annotations into the values.yaml:

upstream:
  backstage:
    ... # other configurations
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/path: "/metrics"
      prometheus.io/port: "7007"
      prometheus.io/scheme: "http"

Then added a modified ama-metrics-settings-configmap.yaml ConfigMap to the kube-system namespace Then the metrics should appear in Prometheus, and if Grafana instance is configured, it can be displayed on an Grafana instance:

Tested with Container Insights, Managed Prometheus and Managed Grafana turned on for the AKS cluster This method customizes the metrics scrapping with metrics add-on of Azure Monitor.

Zaperex commented 1 year ago

The other alternative is to collect the Prometheus metrics and send it to the Log Analytics workspace with Container Insights.

Similar to the metrics add-on method, there are multiple ways to query the /metrics endpoint (service url, url/endpoint, and pod annotations are both options), but I think the pod annotation method would be better to recommend since the service method would require them to hardcode the service's url (or the ingress' url) to a ConfigMap.

The ConfigMap in question will be a modified version of this ConfigMap and these new configurations are placed in the prometheus-data-collection-settings field, and the ConfigMap will be added to the cluster.

This method allows the Log Analytics to capture the Prometheus metrics and allow users to query them with KQL.

Need to investigate the difference between this method and the previously explored method.

janus-idp / backstage-showcase

Document how to use showcase with Azure Logging + Monitoring #615

What needs to be done?