Closed christophe-f closed 11 months ago
@christophe-f @kadel can I just confirm that the requirements for this issue are:
serviceMonitor
is able to ingest data from the /metrics
endpoint for the showcase?The 3rd point is just my guess at what the "monitoring" part of this issue refers to, please correct me if my assumption is incorrect. Just need to get some confirmation before I start trying to deploy stuff.
Yes that's great
To deploy the helm chart onto AKS, the default configurations for the postgresql helm chart would fail.
It is necessary for the user to configure the podSecurityContext
with the postgres user and group in the postgres image.
postgresql:
primary:
podSecurityContext:
enabled: true
fsGroup: 26
runAsUser: 26
Otherwise, the postgresql image will error out due to permission errors: mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied
The ServiceMonitor
from the upstream backstage chart utilizes the /metrics
endpoint for it's metric collection.
However, I'm currently having difficulty accessing the /metrics
endpoint in the showcase and RHDH image. This is probably due to the frontend and backend sharing the same baseUrl
in the RHDH and Showcase images, which causes the /metrics
endpoint to get sent to the frontend instead of the backend for some reason.
When execing into the backstage-backend
container, and running curl on localhost:7007/metrics
, you get the html code for the frontend instead of the expected metrics output:
sh-5.1$ curl localhost:7007/metrics
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><meta name="description" content="Janus Community Showcase"/><link rel="apple-touch-icon" href="/logo192.png"/><link rel="manifest" href="/manifest.json" crossorigin="use-credentials"/><link rel="icon" href="/favicon.ico"/><link rel="shortcut icon" href="/favicon.ico"/><link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png"/><link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5"/><link rel="icon" href=""/><title>Janus IDP Backstage Showcase</title><script defer="defer" src="/static/runtime.01b40e37.js"></script><script defer="defer" src="/static/module-material-ui.acd80c17.js"></script><script defer="defer" src="/static/module-patternfly.347a13d4.js"></script><script defer="defer" src="/static/module-lodash.a37c6806.js"></script><script defer="defer" src="/static/module-lodash-es.7a60cfaf.js"></script><script defer="defer" src="/static/module-backstage.c39bf639.js"></script><script defer="defer" src="/static/module-moment.daa9610c.js"></script><script defer="defer" src="/static/module-mui.a2cae32b.js"></script><script defer="defer" src="/static/module-mathjs.4eb795f8.js"></script><script defer="defer" src="/static/module-date-fns.7572de26.js"></script><script defer="defer" src="/static/module-d3-zoom.d2314a8f.js"></script><script defer="defer" src="/static/module-rjsf.f0fcaf11.js"></script><script defer="defer" src="/static/module-segment.3a8f78f7.js"></script><script defer="defer" src="/static/module-yaml.3442c513.js"></script><script defer="defer" src="/static/module-ajv.16c80e32.js"></script><script defer="defer" src="/static/module-material-table.a8bd62e9.js"></script><script defer="defer" src="/static/module-fp-ts.774d9856.js"></script><script defer="defer" src="/static/module-roadiehq.28dbfc56.js"></script><script defer="defer" src="/static/module-postcss.c44b0d3e.js"></script><script defer="defer" src="/static/module-octokit.3ce4aa6a.js"></script><script defer="defer" src="/static/module-luxon.918e84f5.js"></script><script defer="defer" src="/static/module-react-beautiful-dnd.fb6a161d.js"></script><script defer="defer" src="/static/module-htmlparser2.19123908.js"></script><script defer="defer" src="/static/module-micromark-core-commonmark.ca51b777.js"></script><script defer="defer" src="/static/module-webcola.0897bbb5.js"></script><script defer="defer" src="/static/module-zod.e9b85b5a.js"></script><script defer="defer" src="/static/module-janus-idp.a1549e89.js"></script><script defer="defer" src="/static/module-react-grid-layout.47bdcacc.js"></script><script defer="defer" src="/static/module-io-ts.1bd938b2.js"></script><script defer="defer" src="/static/module-photoswipe.c7e60322.js"></script><script defer="defer" src="/static/module-react-dom.cf4bcd24.js"></script><script defer="defer" src="/static/module-js-yaml.0f6f362e.js"></script><script defer="defer" src="/static/module-decimal.js.fc955b43.js"></script><script defer="defer" src="/static/module-mobx.4824ce59.js"></script><script defer="defer" src="/static/module-remix-run.3c4f92dc.js"></script><script defer="defer" src="/static/vendor.01b40e37.js"></script><script defer="defer" src="/static/main.01b40e37.js"></script><link href="/static/vendor.2001c720.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
As for the Logging, the Azure Monitor Log Analytics works as intended and is mostly able to query the showcase logs.
However, it appears that currently backstage logs contain ANSI codes to provide colors in the terminal. The logs viewed in the azure container logs:
[2m2023-10-30T20:03:21.036Z[22m [34msearch[39m [32minfo[39m Collating documents for techdocs succeeded [36mtype[39m=plugin [36mdocumentType[39m=techdocs
The same logs viewed in the terminal:
This in turn interferes with the KQL queries when using the has
keyword (which searches for exact matches), since the log level indicators like info
, debug
, warning
, etc all have ANSI. ex: info
shows up as [32minfo
Example of a query that is affected:
ContainerLogV2
| where ContainerName == "backstage-backend"
| project TimeGenerated, LogMessage
| where LogMessage has "info"
| order by TimeGenerated desc
| take 100
In the example query above, the expected output would have been all the logs with the info
log level, however, pretty much none would actually appear.
Current workaround is using contains instead of has. However, that might capture logs with just partial matches which might not be ideal.
~Currently blocked by issue discussed in the comment above: https://github.com/janus-idp/backstage-showcase/issues/615#issuecomment-1787753027~ (https://github.com/janus-idp/backstage-showcase/issues/690 has been opened to address the issue discussed)
It seems the issue might be due to the order in which the routers are being registered. Currently the router for the app-backend-plugin
(let's call it app
) is being added to the express app of the ServiceBuilder
first.
Express.js
middleware (including routers) are executed in the order they are added. Since the app
router, /metrics
router and /healthcheck
router all use the same root of ""
, then any request to /metrics
and /healthcheck
will be handled by the app router before it gets a chance to reach the /metrics
or healthcheck
router.
https://github.com/janus-idp/backstage-showcase/issues/690 was opened to capture this issue.
Currently blocked by https://github.com/janus-idp/backstage-showcase/issues/690
https://github.com/janus-idp/backstage-showcase/issues/690 has been resolved.
Update on metrics. Was able to expose the /metrics
endpoint to the Azure Managed Prometheus instance using pod annotations.
In the helm chart, the users would need to add the following annotations into the values.yaml
:
upstream:
backstage:
... # other configurations
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "7007"
prometheus.io/scheme: "http"
Then added a modified ama-metrics-settings-configmap.yaml
ConfigMap to the kube-system
namespace
Then the metrics should appear in Prometheus, and if Grafana instance is configured, it can be displayed on an Grafana instance:
Tested with Container Insights, Managed Prometheus and Managed Grafana turned on for the AKS cluster This method customizes the metrics scrapping with metrics add-on of Azure Monitor.
The other alternative is to collect the Prometheus metrics and send it to the Log Analytics workspace with Container Insights.
Similar to the metrics add-on
method, there are multiple ways to query the /metrics
endpoint (service url, url/endpoint, and pod annotations are both options), but I think the pod annotation method would be better to recommend since the service method would require them to hardcode the service's url (or the ingress' url) to a ConfigMap.
The ConfigMap in question will be a modified version of this ConfigMap and these new configurations are placed in the prometheus-data-collection-settings
field, and the ConfigMap will be added to the cluster.
This method allows the Log Analytics to capture the Prometheus metrics and allow users to query them with KQL.
Need to investigate the difference between this method and the previously explored method.
What needs to be done?
On a previous commit by Tomas, we have observability in the showcase app for Openshift. We need to make sure that this is also working for Azure.