resgateio / resgate

A Realtime API Gateway used with NATS to build REST, real time, and RPC APIs, where all your clients are synchronized seamlessly.
https://resgate.io
MIT License
685 stars 67 forks source link

Add basic prometheus exporter #238

Closed lmendes86 closed 2 months ago

lmendes86 commented 1 year ago

We are recently running Resgate on our platform; thank you for this repo, by the way. However, we still needed basic metrics from it, so we decided to add the possibility of having a Prometheus exporter in Resgate to have insight into how it is operating. We also upgraded to the latest Golang version, but if required, we can remove those changes from this pull request. I hope this helps!

Atomzwieback commented 10 months ago

@jirenius will this not be merged? Would be cool to have some basic metrics

jirenius commented 2 months ago

Sorry for the radio silence. The resgate project has been on ice for a while, even if the gateway has been actively used in other projects lead by me. Now I am working on a new release with some further improvements, bugfixes, and updated dependencies. I wish to include this PR in the release, but due to other changes made, I will merge it into a side branch and handle the conflicts there.

Also, thanks for this PR inspiring other improvements!

jirenius commented 2 months ago

Testing it, I see one issue with the number of dependencies that comes with the prometheus package. The compiled file size increased with about 60% (6MB), making the server more vulnerable to dependency abuse chains.

One option would be to use a dependency-free package (eg. github.com/bsm/openmetrics ) to expose the desired metrics. The ones I would add would probably be:

I think I'll skip:

lmendes86 commented 2 months ago

It's nice to hear that this is being taken care of! Those metrics look good! We are using resgate_subscriptions, but I understand that it could lead to many metrics if there are a lot of subscription topics; for us, it is quite insightful to have, so it could be useful to keep it with an opt-in if you think that is a possibility. Here, I leave an example of a Grafana visualization of our current implementation. image Thanks in advance for the work!

jirenius commented 2 months ago

Ah, that is nice!

For the grouping of resource IDs, Resgate would need some sort of knowledge of patterns. In your branch, you've solved it by detecting {id} and {uuid} parts. But I will try to see if I can come up with a more generic way to solve it. One way would be to provide resgate with resource patterns to track metrics through configuration:

{
   "metrics": {
      "resourcePatterns": [
         "availability.client.*",
         "availability.client.*.user.*",
         "availability.client.*.user.*.device",
         "availability.client.*.user.*.device.*",
         "dashboard.client.*",
         "dashboard.queue.*",
         "usertoken"
      ]
   }
}

It would require you to manually update resgate's configuration with the resource patterns. So, it might work for some use cases.

Anyway. While I failed to merge your PR into develop due to me choosing to solve it differently and with a different package, it was still great inspiration in many ways! Big thanks for it!