Open willowmck opened 2 years ago
Use controller runtime graph as a performance metric?
I just grabbed all the metrics from the management plane in 2.0.9 and grep'd on HELP.
# HELP cluster_manager_active_clusters
# HELP cluster_manager_cds_update_attempt
# HELP cluster_manager_cds_update_duration
# HELP cluster_manager_cds_update_failure
# HELP cluster_manager_cds_update_success
# HELP cluster_manager_cds_update_time
# HELP cluster_manager_cds_version
# HELP cluster_manager_cluster_added
# HELP cluster_manager_cluster_modified
# HELP cluster_manager_cluster_removed
# HELP cluster_manager_cluster_updated
# HELP cluster_manager_update_out_of_merge_window
# HELP cluster_manager_warming_clusters
# HELP cluster_xds_grpc_circuit_breakers_default_cx_open
# HELP cluster_xds_grpc_circuit_breakers_default_cx_pool_open
# HELP cluster_xds_grpc_circuit_breakers_default_rq_open
# HELP cluster_xds_grpc_circuit_breakers_default_rq_pending_open
# HELP cluster_xds_grpc_circuit_breakers_high_cx_pool_open
# HELP cluster_xds_grpc_default_total_match_count
# HELP cluster_xds_grpc_http2_pending_send_bytes
# HELP cluster_xds_grpc_http2_streams_active
# HELP cluster_xds_grpc_internal_upstream_rq_200
# HELP cluster_xds_grpc_internal_upstream_rq_2xx
# HELP cluster_xds_grpc_internal_upstream_rq_completed
# HELP cluster_xds_grpc_membership_change
# HELP cluster_xds_grpc_membership_degraded
# HELP cluster_xds_grpc_membership_excluded
# HELP cluster_xds_grpc_membership_healthy
# HELP cluster_xds_grpc_membership_total
# HELP cluster_xds_grpc_upstream_cx_active
# HELP cluster_xds_grpc_upstream_cx_connect_ms
# HELP cluster_xds_grpc_upstream_cx_destroy
# HELP cluster_xds_grpc_upstream_cx_destroy_local
# HELP cluster_xds_grpc_upstream_cx_http2_total
# HELP cluster_xds_grpc_upstream_cx_length_ms
# HELP cluster_xds_grpc_upstream_cx_max_requests
# HELP cluster_xds_grpc_upstream_cx_protocol_error
# HELP cluster_xds_grpc_upstream_cx_rx_bytes_buffered
# HELP cluster_xds_grpc_upstream_cx_rx_bytes_total
# HELP cluster_xds_grpc_upstream_cx_total
# HELP cluster_xds_grpc_upstream_cx_tx_bytes_total
# HELP cluster_xds_grpc_upstream_rq_200
# HELP cluster_xds_grpc_upstream_rq_2xx
# HELP cluster_xds_grpc_upstream_rq_active
# HELP cluster_xds_grpc_upstream_rq_completed
# HELP cluster_xds_grpc_upstream_rq_pending_active
# HELP cluster_xds_grpc_upstream_rq_pending_total
# HELP cluster_xds_grpc_upstream_rq_total
# HELP component_proxy_tag_1_13_4_solo__istio_build
# HELP controller_runtime_active_workers Number of currently used workers per controller
# HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller
# HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller
# HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller
# HELP controller_runtime_reconcile_total Total number of reconciliations per controller
# HELP gloo_mesh_reconciler_time_sec how long the reconciler takes in seconds
# HELP gloo_mesh_redis_sync_err Number of times redis has failed to read
# HELP gloo_mesh_snapshot_upserter_op_time_sec how long a snapshot upserter operation takes to upsert in seconds
# HELP gloo_mesh_translation_time_sec how long a context translation takes in seconds
# HELP gloo_mesh_translator_concurrency The number of concurrent translations being run by Gloo Mesh
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# HELP go_goroutines Number of goroutines that currently exist.
# HELP go_info Information about the Go environment.
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# HELP go_memstats_frees_total Total number of frees.
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# HELP go_memstats_heap_objects Number of allocated objects.
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# HELP go_memstats_lookups_total Total number of pointer lookups.
# HELP go_memstats_mallocs_total Total number of mallocs.
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# HELP go_threads Number of OS threads created.
# HELP http_inbound_0_0_0_0_9080_rbac_allowed
# HELP http_inbound_0_0_0_0_9080_rbac_denied
# HELP istio_request_bytes
# HELP istio_request_duration_milliseconds
# HELP istio_requests_total
# HELP istio_response_bytes
# HELP istio_tcp_connections_closed_total
# HELP istio_tcp_connections_opened_total
# HELP istio_tcp_received_bytes_total
# HELP istio_tcp_sent_bytes_total
# HELP listener_manager_lds_update_attempt
# HELP listener_manager_lds_update_duration
# HELP listener_manager_lds_update_failure
# HELP listener_manager_lds_update_success
# HELP listener_manager_lds_update_time
# HELP listener_manager_lds_version
# HELP listener_manager_listener_added
# HELP listener_manager_listener_create_success
# HELP listener_manager_listener_in_place_updated
# HELP listener_manager_listener_modified
# HELP listener_manager_listener_removed
# HELP listener_manager_total_filter_chains_draining
# HELP listener_manager_total_listeners_active
# HELP listener_manager_total_listeners_draining
# HELP listener_manager_total_listeners_warming
# HELP listener_manager_workers_started
# HELP objects_synced_total Total number of successful object writes to storage. result indicates the result of the write, i.e. created, updated, unchanged
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# HELP process_max_fds Maximum number of open file descriptors.
# HELP process_open_fds Number of open file descriptors.
# HELP process_resident_memory_bytes Resident memory size in bytes.
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# HELP relay_pull_clients_connected Current number of connected Relay pull clients (Relay Agents).
# HELP relay_push_clients_connected Current number of connected Relay push clients (Relay Agents).
# HELP relay_push_clients_warmed Current number of warmed Relay push clients (Relay Agents).
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# HELP server_concurrency
# HELP server_days_until_first_cert_expiring
# HELP server_dynamic_unknown_fields
# HELP server_hot_restart_epoch
# HELP server_initialization_time_ms
# HELP server_live
# HELP server_main_thread_watchdog_mega_miss
# HELP server_main_thread_watchdog_miss
# HELP server_memory_allocated
# HELP server_memory_heap_size
# HELP server_memory_physical_size
# HELP server_parent_connections
# HELP server_state
# HELP server_static_unknown_fields
# HELP server_stats_recent_lookups
# HELP server_total_connections
# HELP server_uptime
# HELP server_version
# HELP server_wip_protos
# HELP server_worker_0_watchdog_mega_miss
# HELP server_worker_0_watchdog_miss
# HELP server_worker_1_watchdog_mega_miss
# HELP server_worker_1_watchdog_miss
# HELP server_worker_2_watchdog_mega_miss
# HELP server_worker_2_watchdog_miss
# HELP server_worker_3_watchdog_miss
# HELP server_worker_4_watchdog_mega_miss
# HELP server_worker_4_watchdog_miss
# HELP server_worker_5_watchdog_mega_miss
# HELP server_worker_5_watchdog_miss
# HELP server_worker_6_watchdog_mega_miss
# HELP server_worker_6_watchdog_miss
# HELP server_worker_7_watchdog_mega_miss
# HELP server_worker_7_watchdog_miss
# HELP wasm_envoy_wasm_runtime_null_active
# HELP wasm_envoy_wasm_runtime_null_created
# HELP wasm_filter_stats_filter_cache_hit_metric_cache_count
# HELP wasm_filter_stats_filter_cache_miss_metric_cache_count
# HELP workqueue_adds_total Total number of adds handled by workqueue
# HELP workqueue_depth Current depth of workqueue
# HELP workqueue_longest_running_processor_seconds How many seconds has the longest running processor for workqueue been running.
# HELP workqueue_queue_duration_seconds How long in seconds an item stays in workqueue before being requested
# HELP workqueue_retries_total Total number of retries handled by workqueue
# HELP workqueue_unfinished_work_seconds How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
# HELP workqueue_work_duration_seconds How long in seconds processing an item from workqueue takes.
Issue 1959 being tracked for this.