tgstation / tgstation-server

A production scale tool for DreamMaker server management
https://tgstation.github.io/tgstation-server/
GNU Affero General Public License v3.0
66 stars 82 forks source link

Expose prometheus metrics #2001

Open scriptis opened 2 days ago

scriptis commented 2 days ago

Is your feature request related to a problem? Please describe.

I like graphs. We all like graphs. And alerts, too! I use Grafana to make me graphs and scream at me when something is wrong. I use Prometheus to feed Grafana most of my crap and also maybe use it to scream at me when something is wrong. I'd love TGS to expose a metrics reporting endpoint so I can collect metrics and get yelled at on a regular basis when Dream Daemon runs out of memory.

Describe the solution you'd like Allow TGS to bind to a second port/address that exposes a /metrics endpoint in line with the Prometheus documentation. Reporting memory usage is a must, but there's a lot of metrics that could get crammed into here so my ops can get alerted when's a good time to figure out who poured gasoline all over the server.

Describe alternatives you've considered I could write a shim on my side to curl TGS and emit these metrics for me, but it might make sense to just build this endpoint directly into TGS since so many other routing/deployment/infrastructure daemons provide a similar feature.

Additional context

n/a

Cyberboss commented 2 days ago

Needs a config to either be accessible publicly or via an X-Auth-Token header

Cyberboss commented 2 days ago

Bearer token*

On Sun, Nov 10, 2024, 8:12 p.m. scriptis @.***> wrote:

Is your feature request related to a problem? Please describe.

I like graphs. We all like graphs. And alerts, too! I use Grafana to make me graphs and scream at me when something is wrong. I use Prometheus to feed Grafana most of my crap and also maybe use it to scream at me when something is wrong. I'd love TGS to expose a metrics reporting endpoint so I can collect metrics and get yelled at on a regular basis when Dream Daemon runs out of memory.

Describe the solution you'd like Allow TGS to bind to a second port/address that exposes a /metrics endpoint in line with the Prometheus documentation https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-example. Reporting memory usage is a must, but there's a lot of metrics that could get crammed into here so my ops can get alerted when's a good time to figure out who poured gasoline all over the server.

Describe alternatives you've considered I could write a shim on my side to curl TGS and emit these metrics for me, but it might make sense to just build this endpoint directly into TGS since so many other routing/deployment/infrastructure daemons provide a similar feature.

Additional context

n/a

— Reply to this email directly, view it on GitHub https://github.com/tgstation/tgstation-server/issues/2001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6LA6SXEIS2VT3GJ5IAMN3Z77757AVCNFSM6AAAAABRQZCXL2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2DOOBWGM4TIMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>