kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.6k stars 332 forks source link

Expose envoy TCP statistics for both upstreams and downstreams #5898

Open johnharris85 opened 1 year ago

johnharris85 commented 1 year ago

Description

We should be able to gather metrics on how the underlying network is performing to understand how this is impacting requests in our mesh gateway.

We would like the statistics listed here: https://www.envoyproxy.io/docs/envoy/latest/configuration/upstream/cluster_manager/cluster_stats#tcp-statistics exposed for both upstreams and downstreams by optionally wrapping the DownstreamTlsContext and UpstreamTlsContext definitions. This only works on Linux so this would likely need to be customisable via a MeshGatewayInstance policy.

For example for a downstream socket:

{
   "transport_socket":{
      "name":"envoy.transport_sockets.downstream",
      "typed_config":{
         "@type":"type.googleapis.com/envoy.extensions.transport_sockets.tcp_stats.v3.Config",
         "update_period":"5s",
         "transport_socket":{
            "name":"envoy.transport_sockets.tls",
            "typed_config":{
               "@type":"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext",
               "common_tls_context":{
                  "tls_certificates":[
                     {
                        "certificate_chain":{
                           "inline_bytes":"..."
                        },
                        "private_key":{
                           "inline_bytes":"..."
                        }
                     }
                  ],
                  "validation_context":{
                     "trusted_ca":{
                        "inline_bytes":"..."
                     },
                     "match_subject_alt_names":[
                        {
                           "exact":"kuma-cp"
                        }
                     ]
                  }
               },
               "require_client_certificate":true
            }
         }
      }
   }
}
jakubdyszkiewicz commented 1 year ago

Triage: We can potentially put it in a new MeshTrafficMetrics policy #5708

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 10 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 7 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

Automaat commented 6 months ago

@johnharris85 how DownstreamTlsContext and UpstreamTlsContext are connected with metrics? Don't we already publish this metrics?

github-actions[bot] commented 3 months ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 3 weeks ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.