Add basic prometheus exporter

lmendes86 commented 1 year ago

We are recently running Resgate on our platform; thank you for this repo, by the way. However, we still needed basic metrics from it, so we decided to add the possibility of having a Prometheus exporter in Resgate to have insight into how it is operating. We also upgraded to the latest Golang version, but if required, we can remove those changes from this pull request. I hope this helps!

Atomzwieback commented 10 months ago

@jirenius will this not be merged? Would be cool to have some basic metrics

jirenius commented 2 months ago

Sorry for the radio silence. The resgate project has been on ice for a while, even if the gateway has been actively used in other projects lead by me. Now I am working on a new release with some further improvements, bugfixes, and updated dependencies. I wish to include this PR in the release, but due to other changes made, I will merge it into a side branch and handle the conflicts there.

Also, thanks for this PR inspiring other improvements!

jirenius commented 2 months ago

Testing it, I see one issue with the number of dependencies that comes with the prometheus package. The compiled file size increased with about 60% (6MB), making the server more vulnerable to dependency abuse chains.

One option would be to use a dependency-free package (eg. github.com/bsm/openmetrics ) to expose the desired metrics. The ones I would add would probably be:

process_start_time_seconds
go_memstats_* (using runtime.MemStats)
go_info
- version
resgate_info - Gauge (set to 1)
- version=<resgate version>,protocol=`
resgate_ws_current_connections - Gauge
resgate_ws_connections_total - Counter
resgate_ws_subscriptions - Gauge
- type=direct
- type=indirect
resgate_ws_requests_total - Counter
- method=get
- method=subscribe
- method=call
- method=auth
resgate_cached_resources - Gauge
resgate_http_requests_total - Counter
- method=POST
- method=GET

I think I'll skip:

resgate_nats_connected - Since resgate stops if the connection is closed. So it will always be 1 when successfully scraping
resgate_subscriptions - (or rather, per resource subscription labels) Some solutions may have many thousand different subscriptions, causing the metrics response to be huge. Possibly have it as an opt in thing through configuration.

lmendes86 commented 2 months ago

It's nice to hear that this is being taken care of! Those metrics look good! We are using resgate_subscriptions, but I understand that it could lead to many metrics if there are a lot of subscription topics; for us, it is quite insightful to have, so it could be useful to keep it with an opt-in if you think that is a possibility. Here, I leave an example of a Grafana visualization of our current implementation. Thanks in advance for the work!

jirenius commented 2 months ago

Ah, that is nice!

For the grouping of resource IDs, Resgate would need some sort of knowledge of patterns. In your branch, you've solved it by detecting {id} and {uuid} parts. But I will try to see if I can come up with a more generic way to solve it. One way would be to provide resgate with resource patterns to track metrics through configuration:

{
   "metrics": {
      "resourcePatterns": [
         "availability.client.*",
         "availability.client.*.user.*",
         "availability.client.*.user.*.device",
         "availability.client.*.user.*.device.*",
         "dashboard.client.*",
         "dashboard.queue.*",
         "usertoken"
      ]
   }
}

It would require you to manually update resgate's configuration with the resource patterns. So, it might work for some use cases.

Anyway. While I failed to merge your PR into develop due to me choosing to solve it differently and with a different package, it was still great inspiration in many ways! Big thanks for it!

resgateio / resgate

Add basic prometheus exporter #238