Open dtp263 opened 2 years ago
It's definitely possible to integrate prometheus into Sidecar as we have the helm setup already intrinsic to the Repo. I am not totally sure on what the workload on integration would look like though - @ArshamTeymouri Any thoughts on this?
Hi Tarik!
That's a great idea to have the metrics exposed by the application. For this, we need a new endpoint on the application, say /metrics
, that exposes application details in a Prometheus-compatible way. Please check Prometheus docs for more details about how to integrate those metrics in your application.
Besides, two additional endpoints should also be introduced, /readiness
and liveness
. Those endpoints will expose the general health of the application. e.g. If the app is crashing, the /liveness
should return a non-200 HTTP response and if the application is working fine and is just waiting for its backing database to sync, /readiness
should be non-200
. This way Kubernetes knows whether it needs to restart the pod(application) because of a crash or wait until the application becomes ready to serve requests.
We have already done the same thing for one of the internal tools, Gitspiegel. You can reference its corresponding issue and PR for more details: https://github.com/paritytech/gitspiegel/issues/84
Once those endpoints are in place, I can proceed with configuring Prometheus to scrape the metrics and store them in its database. Later, you would be able to create your own dashboards on Grafana and visualize your application performance with fancy panels.
Feel free to reach me on Matrix if you need anything.
@ArshamTeymouri I love the idea of adding the readiness
and liveness
endpoints. But I wonder if it would make more sense to do that in a separate effort.
Implementing and exposing the basic metrics at /metrics
seems like a nice objective block of work. Determining liveness
/ readiness
could be a touch more subjective. Opening up the possibility of one blocking the other.
@ArshamTeymouri I was looking into the issue and I was wondering if it is recommended to run the /metrics
endpoint in a different instance of express (different from what the API is running on) and different port.
Maybe this would be a good idea so that the metrics are not exposed to the internet ? I am thinking of cases like the public instances of Sidecar or if another team uses Sidecar and also has public instances, maybe they would not want their metrics to be exposed ?
Another advantage might be that if Sidecar has issues running in its server instance/port, the /metrics
endpoint (if it is in a different port) will still be running right? So, then the fact that Sidecar is experiencing issues will be reflected in the corresponding metrics.
@Imod7 sorry for the late reply, I was on a long vacation. I'm afraid if it's possible to run the metrics endpoint in a different instance of API. I think it wouldn't be possible to gather metrics from another process by a sidecar. Most of the applications I know expose their metrics on the same api/port, there are always some ways for sysadmins/devops to secure the /metrics endpoint by acls on the webserver level so no worries for that. Though, I'd still suggest to check enterprise applications and their implementation as this topic sounds more developer side than infrastructure.
Current Status of this Issue
metrics
endpoint is available in a dedicated server & port - [ status : DONE
with PR #1232 ]liveness
endpoint [ status : PENDING
]DONE
with [PR #1465]I integrated metrics support into our substrate node helm-chart (https://github.com/paritytech/helm-charts/pull/284), However I don't see a lot of useful metrics exposed beyond the default nodejs cpu/mem. For example, it would be useful to have histogram of response times for each HTTP endpoint.
@PierreBesson thank you so much for the feedback! 💯 🙏 Yes, indeed, it does not have a lot of useful metrics right now since my goal through the 1st iteration was to build the base where later we could add any metrics we think are useful. Also, I think it needs a proper research on what metrics would be useful in a rest api and I have not done this (yet!) so your suggestion is a huge help. Please share if you have any other ideas/suggestions/recommendations/links on what I should check. For now, I will add the histogram metric in the pending list (my previous/comment above).
what is the status on this one? @Imod7
what is the status on this one? @Imod7
The status is reflected in this comment which shows :
metrics
endpoint (with which we get the possibility to add any metrics we want) and one error counter metric.Also, this comment shows that further research can be done to come up with other (than the 2 metrics suggested above) metrics that might useful.
This PR integrated HTTP metrics for all routes [PR #1465 ]
Proposed Change or Idea
Add prometheus metrics for monitoring the health of a sidecar instance.
Steps:
First Metrics
Additional Information
In order to provide a clean consistent service. You need to be able to easily monitor the health and status of each service.