hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.08k stars 2.77k forks source link

Feature request: Prometheus metrics #6336

Closed maxpain closed 1 year ago

maxpain commented 3 years ago

Will it be implemented?

atbe commented 3 years ago

Yes, please!!!

maxpain commented 3 years ago

Any updates?

Tchoupinax commented 3 years ago

I am searching for the same, any update ? :)

Faridalim commented 3 years ago

Up this..

nvcnvn commented 3 years ago

If Hasura based on.. idk, PHP/Java/Node/Go I will properly contribute this monitoring to them. But Haskell...need to learn this strange language hahaha

maiquanghiep commented 2 years ago

It would be greate if Hasura could support Prometheus

zolamk commented 2 years ago

It would be great if hasura published Prometheus metrics but in the mean time, i've created a Hasura Exporter for prometheus, currently it exports metrics about

afitzek commented 2 years ago

Nice @zolamk I also created an hasura exporter based on the json logs: https://github.com/afitzek/hasura-metric-adapter.

It currently supports:

zolamk commented 2 years ago

Nice @zolamk I also created an hasura exporter based on the json logs: https://github.com/afitzek/hasura-metric-adapter.

It currently supports:

* request metrics

* query execution times

* active websockets

* websocket operations

Very nice, it would be great if hasura could expose Prometheus metrics it self or expose the metrics through the metadata api. perhaps we could consider pulling together our efforts into making a community project that exposes all important metrics

rsd1122 commented 2 years ago

Prometheus metrics is now available on Hasura Cloud Standard and the Hasura Enterprise editions. At the moment, we don’t plan to include support for it in Hasura CE (Community Edition). However, please do feel free to share any how-tos and exporters that you’re using along with Hasura CE on this thread!

charklewis commented 2 years ago

@rishi-div is there a reason for not including it on Hasura CE?

soubinan commented 2 years ago

Sadly observing yet another "How a great project opensource oriented become a profit oriented project"

Tchoupinax commented 2 years ago

@soubinan I understand your point of view because I'm a big lover of open source and free usage of community project. However, everything cannot be free. If you take Hasura as example, the main part of the project is open and free. I use it in every project to start and it saves me a lot of time. I'm really sad to discover that some feature are not available with community version but on other hand I think it is a good think to see the project evolving. There were different business model about software and last year the model "free and pay for more" has exploded in video game and open source project. I think it is fair (maybe there is exception). If Hasura is profitable, it will continue. If Hasura reduces or closes its sources, the community can fork. But would the community have the time to continue this project?

It is just my opinion, premium features should exist for a project that can't have another revenue. However, I'm just a lover of the project, not inside Hasura organization. I do not know what happens and how they think it.

Note: as you can see, "sponsoring" gain popularity on GitHub and real question about this become bigger : how can open source's workers can have a paid time for their work?

soubinan commented 2 years ago

@Tchoupinax Yes I totally agree with you, It is absolutely normal to have some profit from this fantastic work that made the hasura team However, as a swe and sre, observability is something so essential that it is kind of non sens to me to not have it as a part of opensource, expecially for prometheus, another opensource project. I would agree if other integrations was not free (splunk, dynatrace, newrelic, datadog....) But on prometheus... Sorry, I can not still understand where is the profitability for hasura, since this point was a good opportunity for guys like me to convince theirs teams wanted something more observable in the stack to integrate hasura in the roadmap...

TheEdgeOfRage commented 2 years ago

And besides, Prometheus integration is in the free tier for the cloud version, which makes even less sense then. They aren't using it to drive profit, but just to have feature separation between the self-hosted and the managed version.

We currently use logs from Loki to get graphs in Grafana, which is slow and cannot trigger alerts in Alertmanager, so we don't have any observability beyond checking whether the Hasura container is running or not. It's annoying that Hasura is the only service that we cannot properly monitor.

Faridalim commented 2 years ago

I agree. Promotheus is vital for monitoring but convincing the team to open it in Hasura Core is maybe not easy since they have their own business scenario.

beingtmk commented 2 years ago

+1

hongbo-miao commented 2 years ago

Hasura is amazing! Really love it!

Just provide another workaround solution which is using Traefik as reverse proxy in a sidecar container and expose the metrics.

Because anyway we need something like reverse proxy to do rate limiting, which I posted the solution at

https://github.com/hasura/graphql-engine/issues/2151#issuecomment-1118264484

Metrics from Traefik ```py # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0.0001409 go_gc_duration_seconds{quantile="0.25"} 0.000382 go_gc_duration_seconds{quantile="0.5"} 0.0004428 go_gc_duration_seconds{quantile="0.75"} 0.0005555 go_gc_duration_seconds{quantile="1"} 0.0019218 go_gc_duration_seconds_sum 0.0082743 go_gc_duration_seconds_count 16 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 36 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.17.9"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 6.212e+06 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 3.7896552e+07 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.476338e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 232596 # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started. # TYPE go_memstats_gc_cpu_fraction gauge go_memstats_gc_cpu_fraction 1.7349255237305537e-05 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 5.795224e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 6.212e+06 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 6.914048e+06 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 8.84736e+06 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 37129 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 6.135808e+06 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 1.5761408e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.6517368844985666e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 269725 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 7200 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 165240 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 212992 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 1.175176e+07 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 1.232766e+06 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 1.015808e+06 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 1.015808e+06 # HELP go_memstats_sys_bytes Number of bytes obtained from system. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 2.551092e+07 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 13 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 3 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 15 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 6.985728e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.65173579631e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 8.17897472e+08 # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE process_virtual_memory_max_bytes gauge process_virtual_memory_max_bytes 1.8446744073709552e+19 # HELP traefik_config_last_reload_failure Last config reload failure # TYPE traefik_config_last_reload_failure gauge traefik_config_last_reload_failure 0 # HELP traefik_config_last_reload_success Last config reload success # TYPE traefik_config_last_reload_success gauge traefik_config_last_reload_success 1.651735799e+09 # HELP traefik_config_reloads_failure_total Config failure reloads # TYPE traefik_config_reloads_failure_total counter traefik_config_reloads_failure_total 0 # HELP traefik_config_reloads_total Config reloads # TYPE traefik_config_reloads_total counter traefik_config_reloads_total 1 # HELP traefik_entrypoint_open_connections How many open connections exist on an entrypoint, partitioned by method and protocol. # TYPE traefik_entrypoint_open_connections gauge traefik_entrypoint_open_connections{entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http"} 0 traefik_entrypoint_open_connections{entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 0 traefik_entrypoint_open_connections{entrypoint="traefik",method="GET",protocol="http"} 1 traefik_entrypoint_open_connections{entrypoint="traefik",method="OPTIONS",protocol="http"} 0 # HELP traefik_entrypoint_request_duration_seconds How long it took to process the request on an entrypoint, partitioned by status code, protocol, and method. # TYPE traefik_entrypoint_request_duration_seconds histogram traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="0.1"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="0.3"} 4 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="1.2"} 4 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="5"} 4 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="+Inf"} 4 traefik_entrypoint_request_duration_seconds_sum{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 0.4204136 traefik_entrypoint_request_duration_seconds_count{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 4 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="GET",protocol="http",le="0.1"} 392 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="GET",protocol="http",le="0.3"} 392 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="GET",protocol="http",le="1.2"} 392 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="GET",protocol="http",le="5"} 392 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="GET",protocol="http",le="+Inf"} 392 traefik_entrypoint_request_duration_seconds_sum{code="200",entrypoint="traefik",method="GET",protocol="http"} 0.06679269999999998 traefik_entrypoint_request_duration_seconds_count{code="200",entrypoint="traefik",method="GET",protocol="http"} 392 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="OPTIONS",protocol="http",le="0.1"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="OPTIONS",protocol="http",le="0.3"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="OPTIONS",protocol="http",le="1.2"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="OPTIONS",protocol="http",le="5"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="200",entrypoint="traefik",method="OPTIONS",protocol="http",le="+Inf"} 1 traefik_entrypoint_request_duration_seconds_sum{code="200",entrypoint="traefik",method="OPTIONS",protocol="http"} 0.0036476 traefik_entrypoint_request_duration_seconds_count{code="200",entrypoint="traefik",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http",le="0.1"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http",le="0.3"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http",le="1.2"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http",le="5"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http",le="+Inf"} 1 traefik_entrypoint_request_duration_seconds_sum{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http"} 0.0073515 traefik_entrypoint_request_duration_seconds_count{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="308",entrypoint="traefik",method="OPTIONS",protocol="http",le="0.1"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="308",entrypoint="traefik",method="OPTIONS",protocol="http",le="0.3"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="308",entrypoint="traefik",method="OPTIONS",protocol="http",le="1.2"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="308",entrypoint="traefik",method="OPTIONS",protocol="http",le="5"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="308",entrypoint="traefik",method="OPTIONS",protocol="http",le="+Inf"} 1 traefik_entrypoint_request_duration_seconds_sum{code="308",entrypoint="traefik",method="OPTIONS",protocol="http"} 0.0001051 traefik_entrypoint_request_duration_seconds_count{code="308",entrypoint="traefik",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="400",entrypoint="traefik",method="GET",protocol="http",le="0.1"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="400",entrypoint="traefik",method="GET",protocol="http",le="0.3"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="400",entrypoint="traefik",method="GET",protocol="http",le="1.2"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="400",entrypoint="traefik",method="GET",protocol="http",le="5"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="400",entrypoint="traefik",method="GET",protocol="http",le="+Inf"} 1 traefik_entrypoint_request_duration_seconds_sum{code="400",entrypoint="traefik",method="GET",protocol="http"} 8.92e-05 traefik_entrypoint_request_duration_seconds_count{code="400",entrypoint="traefik",method="GET",protocol="http"} 1 traefik_entrypoint_request_duration_seconds_bucket{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="0.1"} 3 traefik_entrypoint_request_duration_seconds_bucket{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="0.3"} 3 traefik_entrypoint_request_duration_seconds_bucket{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="1.2"} 3 traefik_entrypoint_request_duration_seconds_bucket{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="5"} 3 traefik_entrypoint_request_duration_seconds_bucket{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http",le="+Inf"} 3 traefik_entrypoint_request_duration_seconds_sum{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 0.00039190000000000004 traefik_entrypoint_request_duration_seconds_count{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 3 # HELP traefik_entrypoint_requests_total How many HTTP requests processed on an entrypoint, partitioned by status code, protocol, and method. # TYPE traefik_entrypoint_requests_total counter traefik_entrypoint_requests_total{code="200",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 4 traefik_entrypoint_requests_total{code="200",entrypoint="traefik",method="GET",protocol="http"} 392 traefik_entrypoint_requests_total{code="200",entrypoint="traefik",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_requests_total{code="204",entrypoint="hasura-graphql-engine-entrypoint",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_requests_total{code="308",entrypoint="traefik",method="OPTIONS",protocol="http"} 1 traefik_entrypoint_requests_total{code="400",entrypoint="traefik",method="GET",protocol="http"} 1 traefik_entrypoint_requests_total{code="429",entrypoint="hasura-graphql-engine-entrypoint",method="POST",protocol="http"} 3 # HELP traefik_service_open_connections How many open connections exist on a service, partitioned by method and protocol. # TYPE traefik_service_open_connections gauge traefik_service_open_connections{method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file"} 0 traefik_service_open_connections{method="POST",protocol="http",service="hasura-graphql-engine-service@file"} 0 # HELP traefik_service_request_duration_seconds How long it took to process the request on a service, partitioned by status code, protocol, and method. # TYPE traefik_service_request_duration_seconds histogram traefik_service_request_duration_seconds_bucket{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file",le="0.1"} 3 traefik_service_request_duration_seconds_bucket{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file",le="0.3"} 4 traefik_service_request_duration_seconds_bucket{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file",le="1.2"} 4 traefik_service_request_duration_seconds_bucket{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file",le="5"} 4 traefik_service_request_duration_seconds_bucket{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file",le="+Inf"} 4 traefik_service_request_duration_seconds_sum{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file"} 0.11438649999999999 traefik_service_request_duration_seconds_count{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file"} 4 traefik_service_request_duration_seconds_bucket{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file",le="0.1"} 1 traefik_service_request_duration_seconds_bucket{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file",le="0.3"} 1 traefik_service_request_duration_seconds_bucket{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file",le="1.2"} 1 traefik_service_request_duration_seconds_bucket{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file",le="5"} 1 traefik_service_request_duration_seconds_bucket{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file",le="+Inf"} 1 traefik_service_request_duration_seconds_sum{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file"} 0.004503 traefik_service_request_duration_seconds_count{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file"} 1 # HELP traefik_service_requests_total How many HTTP requests processed on a service, partitioned by status code, protocol, and method. # TYPE traefik_service_requests_total counter traefik_service_requests_total{code="200",method="POST",protocol="http",service="hasura-graphql-engine-service@file"} 4 traefik_service_requests_total{code="204",method="OPTIONS",protocol="http",service="hasura-graphql-engine-service@file"} 1 ```

Here is the metrics from @afitzek 's https://github.com/afitzek/hasura-metric-adapter which is also a sidecar solution, you can compare the difference.

Metrics from hasura-metric-adapter (having GraphQL query name) ```py # HELP hasura_errors_total the total number of errors per collector # TYPE hasura_errors_total counter hasura_errors_total{collector="cron"} 97 hasura_errors_total{collector="event"} 97 hasura_errors_total{collector="health"} 1 hasura_errors_total{collector="metadata"} 1 hasura_errors_total{collector="scheduled"} 97 # HELP hasura_healthy If 1 hasura graphql server is healthy, 0 otherwise # TYPE hasura_healthy gauge hasura_healthy 1 # HELP hasura_log_lines_counter Number of log lines processed # TYPE hasura_log_lines_counter counter hasura_log_lines_counter{logtype="http-log"} 487 hasura_log_lines_counter{logtype="query-log"} 4 hasura_log_lines_counter{logtype="startup"} 9 hasura_log_lines_counter{logtype="unstructured"} 9 # HELP hasura_log_lines_counter_total Total Number of log lines processed # TYPE hasura_log_lines_counter_total counter hasura_log_lines_counter_total 509 # HELP hasura_metadata_consistency_status If 1 metadata is consistent, 0 otherwise # TYPE hasura_metadata_consistency_status gauge hasura_metadata_consistency_status 1 # HELP hasura_query_execution_seconds Query execution Times (on success error is '' other its the error code) (unnnamed operations are '') # TYPE hasura_query_execution_seconds histogram hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.005"} 3 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.01"} 3 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.025"} 3 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.05"} 3 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.1"} 3 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.25"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="0.5"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="1"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="2.5"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="5"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="10"} 4 hasura_query_execution_seconds_bucket{error="",operation="MyQuery",le="+Inf"} 4 hasura_query_execution_seconds_sum{error="",operation="MyQuery"} 0.1068245 hasura_query_execution_seconds_count{error="",operation="MyQuery"} 4 # HELP hasura_request_counter Number requests # TYPE hasura_request_counter counter hasura_request_counter{status="200",url="/healthz"} 96 hasura_request_counter{status="200",url="/v1/graphql"} 4 hasura_request_counter{status="200",url="/v1/metadata"} 96 hasura_request_counter{status="400",url="/v2/query"} 288 hasura_request_counter{status="404",url="/favicon.ico"} 1 hasura_request_counter{status="404",url="/metrics"} 2 # HELP hasura_request_query_counter Number query requests (on success error is '' other its the error code) (unnnamed operations are '') # TYPE hasura_request_query_counter counter hasura_request_query_counter{error="",operation="MyQuery"} 4 hasura_request_query_counter{error="not-exists",operation=""} 288 ```

Bonus: As Traefik is a reverse proxy. In my case, I am also using hasura-metric-adapter, so I can use Traefik to combine hasura-metric-adapter's /metrics endpoint with Hasura GraphQL Engine's endpoints for this) 😊

In my case, I have three containers in one pod

image

All Kubernetes YAML files are at https://github.com/Hongbo-Miao/hongbomiao.com/tree/main/kubernetes/manifests/hasura

Hopefully it gives people who have same issue some ideas in the future!

rsd1122 commented 1 year ago

Hey everyone, please also check out Andreas' presentation on hasura-metric-adapter at one of our community calls: https://youtu.be/lxgcjOUAbjE?t=2294

rsd1122 commented 1 year ago

Thank you everyone for your comments. Closing this issue as we have released Prometheus metrics as noted above: (Cloud) https://hasura.io/docs/latest/observability/integrations/prometheus/
(Enterprise) https://hasura.io/docs/latest/enterprise/metrics/

There will be an easy way to try the feature in Hasura Enterprise trials soon.

Please use the other exporters as discussed above for Hasura CE.