thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.15k stars 2.1k forks source link

Document metrics exported by Thanos' components #5758

Open douglascamata opened 2 years ago

douglascamata commented 2 years ago

Is your proposal related to a problem?

Yes. Figuring out which metrics are exported by the various Thanos components and the meaning behind them is often a process that requires diving deep into the source code. This consumes a lot of table and can be challenging to people that aren't used to the codebase, Prometheus and/or Thanos.

This could be further broken down into various issues for each component.

Describe the solution you'd like

Add to the docs of each component a list of the metrics they export. The list should contain the metric's name, type, dimensions, and description. Ideally all this information should match the metric definition in the source code.

Describe alternatives you've considered

Probably adding a description of the dimensions when applicable is also a good idea. It could be part of the generic description.

Additional context

This idea was initially proposed by @fpetkovski at #5741.

krishnaindani commented 2 years ago

Is there any plans on how to get started on this? Does this issue need to breakdown by components? We are actively using thanos and see the documentation for the metrics beneficial as well. I would like to work on this

douglascamata commented 2 years ago

@krishnaindani to get started I would suggest opening a separate issue for each component and writing on it if you plan to tackle it. Then it will require some code reading and understanding, followed by markdown table skills.

Some of the metrics already have a description that we could move to the markdown docs, but I am not sure if all of them have such description and whether they are correct. This is a great time to review them and add one where they are missing.

matej-g commented 2 years ago

@krishnaindani alternatively feel free to pick a component and start documenting, I believe we can iterate on this and start documenting one by one!

Abhishek-90 commented 2 years ago

Hello @douglascamata @matej-g , I am new to thanos and would like to understand the working, codebase and other things so that I can contribute. Can you help with it?

philipgough commented 2 years ago

FWIW, in case it is helpful, I took a shot at extracting this programatically:

| METRIC TYPE |                                NAME                                |              HELP              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_index_loads_total                                    | Total number of bucket index   |
|             |                                                                    | loading attempts.              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_index_load_failures_total                            | Total number of bucket index   |
|             |                                                                    | loading failures.              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_bucket_index_load_duration_seconds                          | Duration of the a single       |
|             |                                                                    | bucket index loading operation |
|             |                                                                    | in seconds.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_bucket_index_loaded                                         | Number of bucket indexes       |
|             |                                                                    | currently loaded in-memory.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_replicate_replication_runs_total                            | The number of replication runs |
|             |                                                                    | split by success and error.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_replicate_replication_run_duration_seconds                  | The Duration of replication    |
|             |                                                                    | runs split by success and      |
|             |                                                                    | error.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_replicate_blocks_already_replicated_total                   | Total number of blocks         |
|             |                                                                    | skipped due to already being   |
|             |                                                                    | replicated.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_replicate_blocks_replicated_total                           | Total number of blocks         |
|             |                                                                    | replicated.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_replicate_objects_replicated_total                          | Total number of objects        |
|             |                                                                    | replicated.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_requests_total                            | Total number of items requests |
|             |                                                                    | to the cache.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_hits_total                                | Total number of items requests |
|             |                                                                    | to the cache that were a hit.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_bucket_cache_getrange_requested_bytes_total           | Total number of bytes          |
|             |                                                                    | requested via GetRange.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_bucket_cache_getrange_fetched_bytes_total             | Total number of bytes fetched  |
|             |                                                                    | because of GetRange operation. |
|             |                                                                    | Data from bucket is then       |
|             |                                                                    | stored to cache.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_bucket_cache_getrange_refetched_bytes_total           | Total number of bytes          |
|             |                                                                    | re-fetched from storage        |
|             |                                                                    | because of GetRange operation, |
|             |                                                                    | despite being in cache         |
|             |                                                                    | already.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_bucket_cache_operation_requests_total                 | Number of requested operations |
|             |                                                                    | matching given config.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_bucket_cache_operation_hits_total                     | Number of operations served    |
|             |                                                                    | from cache for given config.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_items_evicted_total                       | Total number of items that     |
|             |                                                                    | were evicted from the index    |
|             |                                                                    | cache.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_items_added_total                         | Total number of items that     |
|             |                                                                    | were added to the index cache. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_requests_total                            | Total number of requests to    |
|             |                                                                    | the cache.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_items_overflowed_total                    | Total number of items that     |
|             |                                                                    | could not be added to the      |
|             |                                                                    | cache due to being too big.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_store_index_cache_hits_total                                | Total number of requests to    |
|             |                                                                    | the cache that were a hit.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_index_cache_items                                     | Current number of items in the |
|             |                                                                    | index cache.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_index_cache_items_size_bytes                          | Current byte size of items in  |
|             |                                                                    | the index cache.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_index_cache_total_size_bytes                          | Current byte size of items     |
|             |                                                                    | (both value and key) in the    |
|             |                                                                    | index cache.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_index_cache_max_size_bytes                            | Maximum number of bytes to be  |
|             |                                                                    | held in the index cache.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_index_cache_max_item_size_bytes                       | Maximum number of bytes for    |
|             |                                                                    | single entry to be held in the |
|             |                                                                    | index cache.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_seconds_total                                         | Total amount of wall clock     |
|             |                                                                    | time spend processing queries. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_fetched_series_total                                  | Number of series fetched to    |
|             |                                                                    | execute a query.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_fetched_chunks_bytes_total                            | Size of all chunks fetched to  |
|             |                                                                    | execute a query in bytes.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | chunk_ops_total                                                    | The total number of chunk      |
|             |                                                                    | operations by their type.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | chunkdesc_ops_total                                                | The total number of chunk      |
|             |                                                                    | descriptor operations by their |
|             |                                                                    | type.                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | memory_chunkdescs                                                  | The current number of chunk    |
|             |                                                                    | descriptors in memory.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | http_request_duration_seconds                                      | Tracks the latencies for HTTP  |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | http_request_size_bytes                                            | Tracks the size of HTTP        |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | http_requests_total                                                | Tracks the number of HTTP      |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | http_response_size_bytes                                           | Tracks the size of HTTP        |
|             |                                                                    | responses.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | http_inflight_requests                                             | Current number of HTTP         |
|             |                                                                    | requests the handler is        |
|             |                                                                    | responding to.                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | http_client_in_flight_requests                                     | A gauge of in-flight requests. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | http_client_request_total                                          | Total http client request by   |
|             |                                                                    | code and method.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | http_client_dns_duration_seconds                                   | Trace dns latency histogram.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | http_client_tls_duration_seconds                                   | Trace tls latency histogram.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | http_client_request_duration_seconds                               | A histogram of request         |
|             |                                                                    | latencies.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_storegateway_client_request_duration_seconds                | Time spent executing requests  |
|             |                                                                    | to the store-gateway.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_storegateway_clients                                        | The current number of          |
|             |                                                                    | store-gateway clients in the   |
|             |                                                                    | pool.                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_querier_blocks_consistency_checks_total                     | Total number of consistency    |
|             |                                                                    | checks run on queried blocks.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_querier_blocks_consistency_checks_failed_total              | Total number of consistency    |
|             |                                                                    | checks failed on queried       |
|             |                                                                    | blocks.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_querier_blocks_scan_duration_seconds                        | The total time it takes to run |
|             |                                                                    | a full blocks scan across the  |
|             |                                                                    | storage.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_querier_blocks_last_successful_scan_timestamp_seconds       | Unix timestamp of the last     |
|             |                                                                    | successful blocks scan.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_querier_storegateway_instances_hit_per_query                | Number of store-gateway        |
|             |                                                                    | instances hit for a single     |
|             |                                                                    | query.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_querier_storegateway_refetches_per_query                    | Number of re-fetches attempted |
|             |                                                                    | while querying store-gateway   |
|             |                                                                    | instances due to missing       |
|             |                                                                    | blocks.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_frontend_mapped_asts_total                                  | Total number of queries that   |
|             |                                                                    | have undergone AST mapping     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_frontend_sharded_queries_total                              | Total number of sharded        |
|             |                                                                    | queries                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_query_frontend_retries                                      | Number of times a request is   |
|             |                                                                    | retried.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_frontend_split_queries_total                                | Total number of underlying     |
|             |                                                                    | query requests after the split |
|             |                                                                    | by interval is applied         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_frontend_query_range_duration_seconds                       | Total time spent in seconds    |
|             |                                                                    | doing query range requests.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_frontend_queries_total                                | Total queries sent per tenant. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | gate_queries_max                                                   | Maximum number of concurrent   |
|             |                                                                    | queries.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | gate_queries_in_flight                                             | Number of queries that are     |
|             |                                                                    | currently in flight.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | gate_duration_seconds                                              | How many seconds it took for   |
|             |                                                                    | queries to wait at the gate.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_frontend_sharding_middleware_queries_total                  | Total number of queries        |
|             |                                                                    | analyzed by the sharding       |
|             |                                                                    | middleware                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_frontend_split_queries_total                                | Total number of underlying     |
|             |                                                                    | query requests after the split |
|             |                                                                    | by interval is applied         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_query_frontend_queries_total                                | Total queries passing through  |
|             |                                                                    | query frontend                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_frontend_downsampled_extra_queries_total                    | Total number of additional     |
|             |                                                                    | queries for downsampled data   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_memcache_client_servers                                     | The number of memcache servers |
|             |                                                                    | discovered.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_memcache_client_set_skip_total                              | Total number of skipped set    |
|             |                                                                    | operations because of the      |
|             |                                                                    | value is larger than the       |
|             |                                                                    | max-item-size.                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_cache_dropped_background_writes_total                       | Total count of dropped write   |
|             |                                                                    | backs to cache.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_cache_background_queue_length                               | Length of the cache background |
|             |                                                                    | write queue.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_memcache_request_duration_seconds                           | Total time spent in seconds    |
|             |                                                                    | doing memcache requests.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_added_total                                          | The total number of Put calls  |
|             |                                                                    | on the cache                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_added_new_total                                      | The total number of new        |
|             |                                                                    | entries added to the cache     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_evicted_total                                        | The total number of evicted    |
|             |                                                                    | entries                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | querier_cache_entries                                              | The total number of entries    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_gets_total                                           | The total number of Get calls  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_misses_total                                         | The total number of Get calls  |
|             |                                                                    | that had no valid entry        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | querier_cache_stale_gets_total                                     | The total number of Get        |
|             |                                                                    | calls that had an entry which  |
|             |                                                                    | expired                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | querier_cache_memory_bytes                                         | The current cache size in      |
|             |                                                                    | bytes                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_cache_value_size_bytes                                      | Size of values in the cache.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_cache_request_duration_seconds                              | Total time spent in seconds    |
|             |                                                                    | doing cache requests.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_cache_fetched_keys_total                                    | Total count of keys requested  |
|             |                                                                    | from cache.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_cache_hits_total                                            | Total count of keys found in   |
|             |                                                                    | cache.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_rediscache_request_duration_seconds                         | Total time spent in seconds    |
|             |                                                                    | doing Redis requests.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_forward_requests_total                              | The number of forward          |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_replications_total                                  | The number of replication      |
|             |                                                                    | operations done by the         |
|             |                                                                    | receiver. The success of       |
|             |                                                                    | replication is fulfilled when  |
|             |                                                                    | a quorum is met.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_replication_factor                                  | The number of times to         |
|             |                                                                    | replicate incoming write       |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_receive_write_timeseries                                    | The number of timeseries       |
|             |                                                                    | received in the incoming write |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_receive_write_samples                                       | The number of sampled          |
|             |                                                                    | received in the incoming write |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_head_series_limit                                   | The configured limit for       |
|             |                                                                    | active (head) series of        |
|             |                                                                    | tenants.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_head_series_limited_requests_total                  | The total number of remote     |
|             |                                                                    | write requests that have been  |
|             |                                                                    | dropped due to active series   |
|             |                                                                    | limiting.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_metamonitoring_failed_queries_total                 | The total number of            |
|             |                                                                    | meta-monitoring queries that   |
|             |                                                                    | failed while limiting.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_receive_write_limits_hit                                    | Summary of how much beyond the |
|             |                                                                    | limit a refused remote write   |
|             |                                                                    | request was.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_write_limits                                        | The configured write limits.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_config_hash                                         | Hash of the currently loaded   |
|             |                                                                    | hashring configuration file.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_config_last_reload_successful                       | Whether the last hashring      |
|             |                                                                    | configuration file reload      |
|             |                                                                    | attempt was successful.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_config_last_reload_success_timestamp_seconds        | Timestamp of the last          |
|             |                                                                    | successful hashring            |
|             |                                                                    | configuration file reload.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_hashrings_file_changes_total                        | The number of times the        |
|             |                                                                    | hashrings configuration file   |
|             |                                                                    | has changed.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_hashrings_file_errors_total                         | The number of errors watching  |
|             |                                                                    | the hashrings configuration    |
|             |                                                                    | file.                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_hashrings_file_refreshes_total                      | The number of refreshes of the |
|             |                                                                    | hashrings configuration file.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_hashring_nodes                                      | The number of nodes per        |
|             |                                                                    | hashring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_receive_hashring_tenants                                    | The number of tenants per      |
|             |                                                                    | hashring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | status                                                             | Represents status (0 indicates |
|             |                                                                    | failure, 1 indicates success)  |
|             |                                                                    | of the component.              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | syncs_total                                                        | Total blocks metadata          |
|             |                                                                    | synchronization attempts       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | sync_failures_total                                                | Total blocks metadata          |
|             |                                                                    | synchronization failures       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | sync_duration_seconds                                              | Duration of the blocks         |
|             |                                                                    | metadata synchronization in    |
|             |                                                                    | seconds                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | synced                                                             | Number of block metadata       |
|             |                                                                    | synced                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | modified                                                           | Number of blocks whose         |
|             |                                                                    | metadata changed               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | base_syncs_total                                                   | Total blocks metadata          |
|             |                                                                    | synchronization attempts by    |
|             |                                                                    | base Fetcher                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | consistency_delay_seconds                                          | Configured consistency delay   |
|             |                                                                    | in seconds.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | grpc_req_panics_recovered_total                                    | Total number of gRPC requests  |
|             |                                                                    | recovered from internal panic. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_rule_config_last_reload_successful                          | Whether the last configuration |
|             |                                                                    | reload attempt was successful. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_rule_config_last_reload_success_timestamp_seconds           | Timestamp of the last          |
|             |                                                                    | successful configuration       |
|             |                                                                    | reload.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_rule_duplicated_query_addresses_total                       | The number of times a          |
|             |                                                                    | duplicated query addresses is  |
|             |                                                                    | detected from the different    |
|             |                                                                    | configs in rule.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_rule_loaded_rules                                           | Loaded rules partitioned by    |
|             |                                                                    | file and group.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_rule_evaluation_with_warnings_total                         | The total number of rule       |
|             |                                                                    | evaluation that were           |
|             |                                                                    | successful but had warnings    |
|             |                                                                    | which can indicate partial     |
|             |                                                                    | error.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_delete_delay_seconds                                        | Configured delete delay in     |
|             |                                                                    | seconds.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_compact_halted                                              | Set to 1 if the compactor      |
|             |                                                                    | halted due to an unexpected    |
|             |                                                                    | error.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_retries_total                                       | Total number of retries after  |
|             |                                                                    | retriable compactor error.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_iterations_total                                    | Total number of iterations     |
|             |                                                                    | that were executed             |
|             |                                                                    | successfully.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_block_cleanup_loops_total                           | Total number of concurrent     |
|             |                                                                    | cleanup loops of partially     |
|             |                                                                    | uploaded blocks and marked     |
|             |                                                                    | blocks that were executed      |
|             |                                                                    | successfully.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_aborted_partial_uploads_deletion_attempts_total     | Total number of started        |
|             |                                                                    | deletions of blocks that       |
|             |                                                                    | are assumed aborted and only   |
|             |                                                                    | partially uploaded.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_blocks_cleaned_total                                | Total number of blocks deleted |
|             |                                                                    | in compactor.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_block_cleanup_failures_total                        | Failures encountered while     |
|             |                                                                    | deleting blocks in compactor.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_blocks_marked_total                                 | Total number of blocks marked  |
|             |                                                                    | in compactor.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_garbage_collected_blocks_total                      | Total number of blocks marked  |
|             |                                                                    | for deletion by compactor.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_query_duplicated_store_addresses_total                      | The number of times a          |
|             |                                                                    | duplicated store addresses is  |
|             |                                                                    | detected from the different    |
|             |                                                                    | configs in query               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_multi_db_updates_attempted_total                    | Number of Multi DB attempted   |
|             |                                                                    | reloads with flush and         |
|             |                                                                    | potential upload due to        |
|             |                                                                    | hashring changes               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_receive_multi_db_updates_completed_total                    | Number of Multi DB completed   |
|             |                                                                    | reloads with flush and         |
|             |                                                                    | potential upload due to        |
|             |                                                                    | hashring changes               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_sidecar_prometheus_up                                       | Boolean indicator whether      |
|             |                                                                    | the sidecar can reach its      |
|             |                                                                    | Prometheus peer.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_downsample_total                                    | Total number of downsampling   |
|             |                                                                    | attempts.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_downsample_failures_total                           | Total number of failed         |
|             |                                                                    | downsampling attempts.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_compact_downsample_duration_seconds                         | Duration of downsample runs    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_memcached_client_info                                       | A metric with a constant '1'   |
|             |                                                                    | value labeled by configuration |
|             |                                                                    | options from which memcached   |
|             |                                                                    | client was configured.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_memcached_operations_total                                  | Total number of operations     |
|             |                                                                    | against memcached.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_memcached_operation_failures_total                          | Total number of operations     |
|             |                                                                    | against memcached that failed. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_memcached_operation_skipped_total                           | Total number of operations     |
|             |                                                                    | against memcached that have    |
|             |                                                                    | been skipped.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_memcached_operation_duration_seconds                        | Duration of operations against |
|             |                                                                    | memcached.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_memcached_operation_data_size_bytes                         | Tracks the size of the data    |
|             |                                                                    | stored in and fetched from     |
|             |                                                                    | memcached.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_redis_operation_duration_seconds                            | Duration of operations against |
|             |                                                                    | redis.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_experimental_features_in_use_total                          | The number of experimental     |
|             |                                                                    | features in use.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | log_messages_total                                                 | Total number of log messages.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | kv_request_duration_seconds                                        | Time spent on kv store         |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | multikv_primary_store                                              | Selected primary KV store      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | multikv_mirror_enabled                                             | Is mirroring to secondary      |
|             |                                                                    | store enabled                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | multikv_mirror_writes_total                                        | Number of mirror-writes to     |
|             |                                                                    | secondary store                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | multikv_mirror_write_errors_total                                  | Number of failures to          |
|             |                                                                    | mirror-write to secondary      |
|             |                                                                    | store                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_garbage_collection_total                            | Total number of garbage        |
|             |                                                                    | collection operations.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_garbage_collection_failures_total                   | Total number of failed garbage |
|             |                                                                    | collection operations.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_compact_garbage_collection_duration_seconds                 | Time it took to perform        |
|             |                                                                    | garbage collection iteration.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_group_compactions_total                             | Total number of group          |
|             |                                                                    | compaction attempts that       |
|             |                                                                    | resulted in a new block.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_group_compaction_runs_started_total                 | Total number of group          |
|             |                                                                    | compaction attempts.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_group_compaction_runs_completed_total               | Total number of group          |
|             |                                                                    | completed compaction runs.     |
|             |                                                                    | This also includes compactor   |
|             |                                                                    | group runs that resulted with  |
|             |                                                                    | no compaction.                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_group_compactions_failures_total                    | Total number of failed group   |
|             |                                                                    | compactions.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_compact_group_vertical_compactions_total                    | Total number of group          |
|             |                                                                    | compaction attempts that       |
|             |                                                                    | resulted in a new block based  |
|             |                                                                    | on overlapping blocks.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_compact_todo_compactions                                    | number of compactions to be    |
|             |                                                                    | done                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_compact_todo_compaction_blocks                              | number of blocks planned to be |
|             |                                                                    | compacted                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_compact_todo_downsample_blocks                              | number of blocks to be         |
|             |                                                                    | downsampled                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_compact_todo_deletion_blocks                                | number of blocks that have     |
|             |                                                                    | crossed their retention period |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_frontend_queue_length                                 | Number of queries in the       |
|             |                                                                    | queue.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_frontend_discarded_requests_total                     | Total number of query requests |
|             |                                                                    | discarded.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_query_frontend_queue_duration_seconds                       | Time spend by requests queued. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_frontend_connected_clients                            | Number of worker clients       |
|             |                                                                    | currently connected to the     |
|             |                                                                    | frontend.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_query_range_requested_timespan_duration_seconds             | A histogram of the query range |
|             |                                                                    | window in seconds              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_table_manager_sync_duration_seconds                         | Time spent synching tables.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_table_capacity_units                                        | Per-table capacity, measured   |
|             |                                                                    | in DynamoDB capacity units.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_table_manager_create_failures                               | Number of table creation       |
|             |                                                                    | failures during the last       |
|             |                                                                    | table-manager reconciliation   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_table_manager_delete_failures                               | Number of table deletion       |
|             |                                                                    | failures during the last       |
|             |                                                                    | table-manager reconciliation   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_table_manager_sync_success_timestamp_seconds                | Timestamp of the last          |
|             |                                                                    | successful table manager sync. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_chunk_store_index_entries_per_chunk                         | Number of entries written to   |
|             |                                                                    | storage per chunk.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_cache_corrupt_chunks_total                                  | Total count of corrupt chunks  |
|             |                                                                    | found in cache.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_chunk_store_index_lookups_per_query                         | Distribution of #index lookups |
|             |                                                                    | per query.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_chunk_store_series_pre_intersection_per_query               | Distribution of #series (pre   |
|             |                                                                    | intersection) per query.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_chunk_store_series_post_intersection_per_query              | Distribution of #series (post  |
|             |                                                                    | intersection) per query.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_chunk_store_chunks_per_query                                | Distribution of #chunks per    |
|             |                                                                    | query.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_chunk_store_deduped_chunks_total                            | Count of chunks which were     |
|             |                                                                    | not stored because they have   |
|             |                                                                    | already been stored by another |
|             |                                                                    | replica.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_shipper_dir_syncs_total                                     | Total number of dir syncs      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_shipper_dir_sync_failures_total                             | Total number of failed dir     |
|             |                                                                    | syncs                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_shipper_uploads_total                                       | Total number of uploaded       |
|             |                                                                    | blocks                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_shipper_upload_failures_total                               | Total number of block upload   |
|             |                                                                    | failures                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_shipper_upload_compacted_done                               | If 1 it means shipper uploaded |
|             |                                                                    | all compacted blocks from the  |
|             |                                                                    | filesystem.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_scheduler_queue_length                                | Number of queries in the       |
|             |                                                                    | queue.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_query_scheduler_discarded_requests_total                    | Total number of query requests |
|             |                                                                    | discarded.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_query_scheduler_queue_duration_seconds                      | Time spend by requests in      |
|             |                                                                    | queue before getting picked up |
|             |                                                                    | by a querier.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_scheduler_connected_querier_clients                   | Number of querier worker       |
|             |                                                                    | clients currently connected to |
|             |                                                                    | the query-scheduler.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_scheduler_connected_frontend_clients                  | Number of query-frontend       |
|             |                                                                    | worker clients currently       |
|             |                                                                    | connected to the               |
|             |                                                                    | query-scheduler.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_purger_delete_requests_received_total                       | Number of delete requests      |
|             |                                                                    | received per user              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_tombstones_loader_cache_gen_load_failures_total             | Total number of failures       |
|             |                                                                    | while loading cache generation |
|             |                                                                    | number using tombstones loader |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_tombstones_loader_cache_delete_requests_load_failures_total | Total number of failures while |
|             |                                                                    | loading delete requests using  |
|             |                                                                    | tombstones loader              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_purger_delete_requests_processed_total                      | Number of delete requests      |
|             |                                                                    | processed per user             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_purger_delete_requests_chunks_selected_total                | Number of chunks selected      |
|             |                                                                    | while building delete plans    |
|             |                                                                    | per user                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_purger_delete_requests_processing_failures_total            | Number of delete requests      |
|             |                                                                    | processing failures per user   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_purger_load_pending_requests_attempts_total                 | Number of attempts that were   |
|             |                                                                    | made to load pending requests  |
|             |                                                                    | with status                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_purger_oldest_pending_delete_request_age_seconds            | Age of oldest pending delete   |
|             |                                                                    | request in seconds, since they |
|             |                                                                    | are over their cancellation    |
|             |                                                                    | period                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_purger_pending_delete_requests_count                        | Count of delete requests which |
|             |                                                                    | are over their cancellation    |
|             |                                                                    | period and have not finished   |
|             |                                                                    | processing yet                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | auto_discovery_config_version                                      | The current auto discovery     |
|             |                                                                    | config version                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | auto_discovery_resolved_addresses                                  | The number of memcached nodes  |
|             |                                                                    | found via auto discovery       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | auto_discovery_total                                               | The number of memcache auto    |
|             |                                                                    | discovery attempts             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | auto_discovery_failures_total                                      | The number of memcache auto    |
|             |                                                                    | discovery failures             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | ring_member_heartbeats_total                                       | The total number of heartbeats |
|             |                                                                    | sent.                          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_member_tokens_owned                                           | The number of tokens owned in  |
|             |                                                                    | the ring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_member_tokens_to_own                                          | The number of tokens to own in |
|             |                                                                    | the ring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | member_consul_heartbeats_total                                     | The total number of heartbeats |
|             |                                                                    | sent to consul.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | member_ring_tokens_owned                                           | The number of tokens owned in  |
|             |                                                                    | the ring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | member_ring_tokens_to_own                                          | The number of tokens to own in |
|             |                                                                    | the ring.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | shutdown_duration_seconds                                          | Duration (in seconds) of       |
|             |                                                                    | shutdown procedure (ie         |
|             |                                                                    | transfer or flush).            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_member_ownership_percent                                      | The percent ownership of the   |
|             |                                                                    | ring by member                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_members                                                       | Number of members in the ring  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_tokens_total                                                  | Number of tokens in the ring   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_tokens_owned                                                  | The number of tokens in the    |
|             |                                                                    | ring owned by the member       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | ring_oldest_member_timestamp                                       | Timestamp of the oldest member |
|             |                                                                    | in the ring.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_verify_blocks_marked_for_deletion_total                     | Total number of blocks marked  |
|             |                                                                    | for deletion by verify.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_memcached_requests_total                              | Total number of items requests |
|             |                                                                    | to memcached.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_memcached_hits_total                                  | Total number of items requests |
|             |                                                                    | to the cache that were a hit.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_redis_requests_total                                  | Total number of items requests |
|             |                                                                    | to redis.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_redis_hits_total                                      | Total number of items requests |
|             |                                                                    | to the cache that were a hit.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_items_evicted_total                          | Total number of items that     |
|             |                                                                    | were evicted from the inmemory |
|             |                                                                    | cache.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_items_added_total                            | Total number of items that     |
|             |                                                                    | were added to the inmemory     |
|             |                                                                    | cache.                         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_requests_total                               | Total number of requests to    |
|             |                                                                    | the inmemory cache.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_hits_on_expired_data_total                   | Total number of requests to    |
|             |                                                                    | the inmemory cache that were   |
|             |                                                                    | a hit but needed to be evicted |
|             |                                                                    | due to TTL.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_items_overflowed_total                       | Total number of items that     |
|             |                                                                    | could not be added to the      |
|             |                                                                    | inmemory cache due to being    |
|             |                                                                    | too big.                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_cache_inmemory_hits_total                                   | Total number of requests to    |
|             |                                                                    | the inmemory cache that were a |
|             |                                                                    | hit.                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_cache_inmemory_items                                        | Current number of items in the |
|             |                                                                    | inmemory cache.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_cache_inmemory_items_size_bytes                             | Current byte size of items in  |
|             |                                                                    | the inmemory cache.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_cache_inmemory_total_size_bytes                             | Current byte size of items     |
|             |                                                                    | (both value and key) in the    |
|             |                                                                    | inmemory cache.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_cache_inmemory_max_size_bytes                               | Maximum number of bytes to be  |
|             |                                                                    | held in the inmemory cache.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_cache_inmemory_max_item_size_bytes                          | Maximum number of bytes for    |
|             |                                                                    | single entry to be held in the |
|             |                                                                    | inmemory cache.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_frontend_queries_in_progress                          | Number of queries in progress  |
|             |                                                                    | handled by this frontend.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_query_frontend_connected_schedulers                         | Number of schedulers this      |
|             |                                                                    | frontend is connected to.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | received_broadcasts_total                                          | Number of received broadcast   |
|             |                                                                    | user messages                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | received_broadcasts_bytes_total                                    | Total size of received         |
|             |                                                                    | broadcast user messages        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | received_broadcasts_invalid_total                                  | Number of received broadcast   |
|             |                                                                    | user messages that were        |
|             |                                                                    | invalid. Hopefully 0.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | state_pushes_total                                                 | How many times did this node   |
|             |                                                                    | push its full state to another |
|             |                                                                    | node                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | state_pushes_bytes_total                                           | Total size of pushed state     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | state_pulls_total                                                  | How many times did this node   |
|             |                                                                    | pull full state from another   |
|             |                                                                    | node                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | state_pulls_bytes_total                                            | Total size of pulled state     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | messages_in_broadcast_queue                                        | Number of user messages in the |
|             |                                                                    | broadcast queue                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | messages_in_broadcast_queue_bytes                                  | Total size of messages waiting |
|             |                                                                    | in the broadcast queue         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | messages_to_broadcast_dropped_total                                | Number of broadcast messages   |
|             |                                                                    | intended to be sent but were   |
|             |                                                                    | dropped due to encoding errors |
|             |                                                                    | or for being too big           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cas_attempt_total                                                  | Attempted CAS operations       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cas_success_total                                                  | Successful CAS operations      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cas_failure_total                                                  | Failed CAS operations          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | kv_store_value_tombstones                                          | Number of tombstones currently |
|             |                                                                    | present in KV store values     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | kv_store_value_tombstones_removed_total                            | Total number of tombstones     |
|             |                                                                    | which have been removed from   |
|             |                                                                    | KV store values                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cluster_members_count                                              | Number of members in           |
|             |                                                                    | memberlist cluster             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cluster_node_health_score                                          | Health score of this cluster.  |
|             |                                                                    | Lower value is better. 0 =     |
|             |                                                                    | healthy                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | watch_prefix_dropped_notifications                                 | Number of dropped              |
|             |                                                                    | notifications in WatchPrefix   |
|             |                                                                    | function                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | incoming_streams_total                                             | Number of incoming memberlist  |
|             |                                                                    | streams                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | outgoing_streams_total                                             | Number of outgoing streams     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | outgoing_stream_errors_total                                       | Number of errors when opening  |
|             |                                                                    | memberlist stream to another   |
|             |                                                                    | node                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_received_total                                             | Number of received memberlist  |
|             |                                                                    | packets                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_received_bytes_total                                       | Total bytes received as        |
|             |                                                                    | packets                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_received_errors_total                                      | Number of errors when          |
|             |                                                                    | receiving memberlist packets   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_sent_total                                                 | Number of memberlist packets   |
|             |                                                                    | sent                           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_sent_bytes_total                                           | Total bytes sent as packets    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | packets_sent_errors_total                                          | Number of errors when sending  |
|             |                                                                    | memberlist packets             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | unknown_connections_total                                          | Number of unknown TCP          |
|             |                                                                    | connections (not a packet or   |
|             |                                                                    | stream)                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | prometheus_store_received_frames                                   | Number of frames received per  |
|             |                                                                    | streamed response.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_proxy_store_empty_stream_responses_total                    | Total number of empty          |
|             |                                                                    | responses received.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_block_loads_total                              | Total number of remote block   |
|             |                                                                    | loading attempts.              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_block_load_failures_total                      | Total number of failed remote  |
|             |                                                                    | block loading attempts.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_block_drops_total                              | Total number of local blocks   |
|             |                                                                    | that were dropped.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_block_drop_failures_total                      | Total number of local blocks   |
|             |                                                                    | that failed to be dropped.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_bucket_store_blocks_loaded                                  | Number of currently loaded     |
|             |                                                                    | blocks.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_bucket_store_blocks_last_loaded_timestamp_seconds           | Timestamp when last block got  |
|             |                                                                    | loaded.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_data_touched                            | How many items of a data type  |
|             |                                                                    | in a block were touched for a  |
|             |                                                                    | single series request.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_data_fetched                            | How many items of a data type  |
|             |                                                                    | in a block were fetched for a  |
|             |                                                                    | single series request.         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_data_size_touched_bytes                 | Size of all items of a data    |
|             |                                                                    | type in a block were touched   |
|             |                                                                    | for a single series request.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_data_size_fetched_bytes                 | Size of all items of a data    |
|             |                                                                    | type in a block were fetched   |
|             |                                                                    | for a single series request.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_blocks_queried                          | Number of blocks in a bucket   |
|             |                                                                    | store that were touched to     |
|             |                                                                    | satisfy a query.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_bucket_store_series_get_all_duration_seconds                | Time it takes until all        |
|             |                                                                    | per-block prepares and loads   |
|             |                                                                    | for a query are finished.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_bucket_store_series_merge_duration_seconds                  | Time it takes to merge         |
|             |                                                                    | sub-results from all queried   |
|             |                                                                    | blocks into a single result.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Summary     | thanos_bucket_store_series_result_series                           | Number of series observed in   |
|             |                                                                    | the final result of a query.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_bucket_store_sent_chunk_size_bytes                          | Size in bytes of the chunks    |
|             |                                                                    | for the single series, which   |
|             |                                                                    | is adequate to the gRPC        |
|             |                                                                    | message size sent to querier.  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_queries_dropped_total                          | Number of queries that were    |
|             |                                                                    | dropped due to the limit.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_series_refetches_total                         |                                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_cached_postings_compressions_total             | Number of postings             |
|             |                                                                    | compressions before storing to |
|             |                                                                    | index cache.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_cached_postings_compression_errors_total       | Number of postings compression |
|             |                                                                    | errors.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_cached_postings_compression_time_seconds_total | Time spent compressing         |
|             |                                                                    | postings before storing them   |
|             |                                                                    | into postings cache.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_cached_postings_original_size_bytes_total      | Original size of postings      |
|             |                                                                    | stored into cache.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_cached_postings_compressed_size_bytes_total    | Compressed size of postings    |
|             |                                                                    | stored into cache.             |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_bucket_store_cached_series_fetch_duration_seconds           | The time it takes to fetch     |
|             |                                                                    | series to respond to a request |
|             |                                                                    | sent to a store gateway. It    |
|             |                                                                    | includes both the time to      |
|             |                                                                    | fetch it from the cache and    |
|             |                                                                    | from storage in case of cache  |
|             |                                                                    | misses.                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_bucket_store_cached_postings_fetch_duration_seconds         | The time it takes to fetch     |
|             |                                                                    | postings to respond to a       |
|             |                                                                    | request sent to a store        |
|             |                                                                    | gateway. It includes both      |
|             |                                                                    | the time to fetch it from the  |
|             |                                                                    | cache and from storage in case |
|             |                                                                    | of cache misses.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_bucket_store_empty_postings_total                           | Total number of empty postings |
|             |                                                                    | when fetching block series.    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | dns_provider_results                                               | The number of resolved         |
|             |                                                                    | endpoints for each configured  |
|             |                                                                    | address                        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | dns_lookups_total                                                  | The number of DNS lookups      |
|             |                                                                    | resolutions attempts           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | dns_failures_total                                                 | The number of DNS lookup       |
|             |                                                                    | failures                       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_reloads_total                                             | Total number of reload         |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_reloads_failed_total                                      | Total number of reload         |
|             |                                                                    | requests that failed.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | reloader_last_reload_successful                                    | Whether the last reload        |
|             |                                                                    | attempt was successful         |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | reloader_last_reload_success_timestamp_seconds                     | Timestamp of the last          |
|             |                                                                    | successful reload              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_config_apply_operations_total                             | Total number of config apply   |
|             |                                                                    | operations.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_config_apply_operations_failed_total                      | Total number of config apply   |
|             |                                                                    | operations that failed.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | reloader_watches                                                   | Number of resources watched by |
|             |                                                                    | the reloader.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_watch_events_total                                        | Total number of events         |
|             |                                                                    | received by the reloader from  |
|             |                                                                    | the watcher.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | reloader_watch_errors_total                                        | Total number of errors         |
|             |                                                                    | received by the reloader from  |
|             |                                                                    | the watcher.                   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | indexheader_lazy_load_total                                        | Total number of index-header   |
|             |                                                                    | lazy load operations.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | indexheader_lazy_load_failed_total                                 | Total number of failed         |
|             |                                                                    | index-header lazy load         |
|             |                                                                    | operations.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | indexheader_lazy_unload_total                                      | Total number of index-header   |
|             |                                                                    | lazy unload operations.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | indexheader_lazy_unload_failed_total                               | Total number of failed         |
|             |                                                                    | index-header lazy unload       |
|             |                                                                    | operations.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | indexheader_lazy_load_duration_seconds                             | Duration of the index-header   |
|             |                                                                    | lazy loading in seconds.       |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_querier_query_frontend_request_duration_seconds             | Time spend doing requests to   |
|             |                                                                    | frontend.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_querier_query_frontend_clients                              | The current number of clients  |
|             |                                                                    | connected to query-frontend.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_ingester_client_request_duration_seconds                    | Time spent doing Ingester      |
|             |                                                                    | requests.                      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | deprecated_flags_inuse_total                                       | The number of deprecated flags |
|             |                                                                    | currently set.                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_partitioner_requested_bytes_total              | Total size of byte ranges      |
|             |                                                                    | required to fetch from the     |
|             |                                                                    | storage before they are passed |
|             |                                                                    | to the partitioner.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_partitioner_expanded_bytes_total               | Total size of byte ranges      |
|             |                                                                    | returned by the partitioner    |
|             |                                                                    | after they've been combined    |
|             |                                                                    | together to reduce the number  |
|             |                                                                    | of bucket API calls.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_partitioner_requested_ranges_total             | Total number of byte ranges    |
|             |                                                                    | required to fetch from the     |
|             |                                                                    | storage before they are passed |
|             |                                                                    | to the partitioner.            |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_partitioner_expanded_ranges_total              | Total number of byte ranges    |
|             |                                                                    | returned by the partitioner    |
|             |                                                                    | after they've been combined    |
|             |                                                                    | together to reduce the number  |
|             |                                                                    | of bucket API calls.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_bucket_stores_gate_queries_concurrent_max                   | Number of maximum concurrent   |
|             |                                                                    | queries allowed.               |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | cortex_bucket_stores_blocks_sync_seconds                           | The total time it takes to     |
|             |                                                                    | perform a sync stores          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds | Unix timestamp of the last     |
|             |                                                                    | successful blocks sync.        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_bucket_stores_tenants_discovered                            | Number of tenants discovered   |
|             |                                                                    | in the bucket.                 |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | cortex_bucket_stores_tenants_synced                                | Number of tenants synced.      |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_chunk_pool_requested_bytes_total               | Total bytes requested to chunk |
|             |                                                                    | bytes pool.                    |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_bucket_store_chunk_pool_returned_bytes_total                | Total bytes returned by the    |
|             |                                                                    | chunk bytes pool.              |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_storegateway_bucket_sync_total                              | Total number of times          |
|             |                                                                    | the bucket sync operation      |
|             |                                                                    | triggered.                     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | consul_request_duration_seconds                                    | Time spent on consul requests. |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_discarded_samples_total                                     | The total number of samples    |
|             |                                                                    | that were discarded.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_discarded_exemplars_total                                   | The total number of exemplars  |
|             |                                                                    | that were discarded.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | cortex_discarded_metadata_total                                    | The total number of metadata   |
|             |                                                                    | that were discarded.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_queue_alerts_dropped_total                            | Total number of alerts that    |
|             |                                                                    | were dropped from the queue.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_queue_alerts_pushed_total                             | Total number of alerts pushed  |
|             |                                                                    | to the queue.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_queue_alerts_popped_total                             | Total number of alerts popped  |
|             |                                                                    | from the queue.                |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_alert_queue_capacity                                        | Capacity of the alert queue.   |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_alert_queue_length                                          | Length of the alert queue.     |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_sender_alerts_sent_total                              | Total number of alerts sent by |
|             |                                                                    | alertmanager.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_sender_errors_total                                   | Total number of errors         |
|             |                                                                    | while sending alerts to        |
|             |                                                                    | alertmanager.                  |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Counter     | thanos_alert_sender_alerts_dropped_total                           | Total number of alerts dropped |
|             |                                                                    | in case of all sends to        |
|             |                                                                    | alertmanagers failed.          |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Histogram   | thanos_alert_sender_latency_seconds                                | Latency for sending alert      |
|             |                                                                    | notifications (not including   |
|             |                                                                    | dropped notifications).        |
|-------------|--------------------------------------------------------------------|--------------------------------|
| Gauge       | thanos_store_nodes_grpc_connections                                | Number indicating current      |
|             |                                                                    | number of gRPC connection to   |
|             |                                                                    | store nodes. This indicates    |
|             |                                                                    | also to how many stores query  |
|             |                                                                    | node have access to.           |
|-------------|--------------------------------------------------------------------|--------------------------------|
philipgough commented 2 years ago

Just a follow up after chatting with @saswatamcode - I took the approach of static analysis but was unaware of the existing effort in #4273 using the existing promlinter tool.

The issues are the same as described in the linked PR in that it is hard to catch all cases.

douglascamata commented 2 years ago

@PhilipGough thanks for the extra info, I didn't know about the other initiatives and I bet many others also didn't.

Indeed I don't think that we will be able to catch everything at once with a tool. We should agree on a cut date for only accepting new metrics being exported together with markdown documentation for it.

Little by little (or batch by batch thanks to the help of different tools) we can add everything.

heliapb commented 1 year ago

Hi @douglascamata in my team we create a dashboard on grafana that has all the metrics available using the https://grafana.com/grafana/plugins/marcusolsson-json-datasource/ plugin. Don't know if it might be useful for this case, but for us it allows to show all metrics for all components, not just Thanos, but could be maybe a way for people to visualize all the Thanos metrics if they use grafana, just a preview as to show what I mean: image

douglascamata commented 1 year ago

@heliapb Wow, it looks neat! Might be a good way for us to fill in this documentation with looots of metrics. But what is this data source querying? It seems to be loading JSON APIs 🤔

heliapb commented 1 year ago

Hi @douglascamata we get all the metrics from our thanos query via /api/v1/metadata, and create a datasource with that JSON plugin, using thanos query as datasource, an example image

TomiwaAribisala-git commented 1 year ago

Hello @douglascamata is this issue still open?

douglascamata commented 1 year ago

@TomiwaAribisala-git yep, it is still open.

TomiwaAribisala-git commented 5 months ago

Hi @douglascamata we get all the metrics from our thanos query via /api/v1/metadata, and create a datasource with that JSON plugin, using thanos query as datasource, an example image

Hello @heliapb, i am working on this issue, I am running a Thanos Query Frontend locally(http://localhost:10901/) and I want to add its URL in the data source plugin, it's giving JSON API: Bad Gateway, please how do I resolve this such that i can add the localhost URL metrics(http://localhost:10901/metrics) in the datasource plugin and it works, I have attached images of the Datasource Plugin and Thanos URL, my local environment is set up based on this repo(https://github.com/dbluxo/quickstart-thanos) datasource1 datasource0 thanos