istio / ztunnel

The `ztunnel` component of ambient mesh
Apache License 2.0
308 stars 101 forks source link

Provide a new label for metrics regarding destination type #1128

Open howardjohn opened 5 months ago

howardjohn commented 5 months ago

I propose adding a new label to all the traffic metrics in ztunnel: traffic_type: waypoint|direct.

The motivation here is to allow querying for traffic without duplication when dealing with waypoints.

When I have a waypoint, I will get 3 timeseries: client-->waypoint, client-->server, waypoint-->server.

Logically, I know the first and last metrics are "to/from waypoint" traffic. However, there is no query to actually express that information, so I cannot filter it out. With the traffic_type we could identify this.

On outbound this is trivial. Its a bit less clear if we can 100% identify waypoint traffic on the inbound path

stevenctl commented 5 months ago

For Sandwich, we use find_inbound_upstream: Checks if the connection address is a workload that is part of the HBONE address's Waypoints.

The tricky part for inbound to-workload is figuring out if the traffic was originally Service or Workload addressed. Can we make a good enough guess here for telemetry purposes? For example, inbound workload has both wl-wp and svc-wp, we can look at the source address + identity and see if it matches either of these and give one of them priority.

Other option would be Waypoint adding a header with the original HBONE target that it saw.

keithmattix commented 5 months ago

Dumb question: wouldn't (Istio) waypoints provide some header with a value of envoy?

howardjohn commented 5 months ago

There are some headers, I don't know any that are guaranteed unless we add one

jshaughn commented 2 months ago

When I have a waypoint, I will get 3 timeseries: client-->waypoint, client-->server, waypoint-->server.

Just to make sure I'm clear... client-->waypoint and waypoint-->server are reported by ztunnel (app="ztunnel" , reporter="destination"), and the client-->server is reported by the waypoint itself (reporter="waypoint"). Right?

I think for ztunnel, traffic_type="direct" would apply for non-waypoint traffic?

I think it would be useful to add traffic_type: waypoint|direct, as long as the waypoint identification is reliable, both when it is the source and dest workload. If there isn't a way to identify the inbound situation then it wouldn't be worth the overhead.

jshaughn commented 2 months ago

maybe too verbose, but just an idea: traffic_type: waypoint_to|waypoint_from|direct

howardjohn commented 2 months ago

The 'from waypoint' is the tricky part. 'To waypoint' is easy, and direct = !towaypoint && !fromwaypoint, but there is no reliable way to know if its from a waypoint. We can apply some heuristics -- perhaps even good enough ones -- but no guarantees for sure

jshaughn commented 2 months ago

The unfortunate effect is that currently, to try and identify these waypoint edges, a consumer like Kiali needs to combine waypoint config and telemetry. This is a slippery slope because config is current, but telem reflects the past. So it's much better if everything can be identified in the telemetry.

Just wanted to confirm my question above, with waypoints, the client-->server telem is only reported by waypoint, right? But waypoint reporting is only for request traffic, afaics. So, for a service using a waypoint that is handling app-level TCP traffic, there is no client-server TCP telem, the only telem is disconnected, client-->waypoint and waypoint-->server. Is that right?

howardjohn commented 2 months ago

No, waypoint should report TCP as well if the service is TCP (same semantics as a sidecar)

jshaughn commented 2 months ago

No, waypoint should report TCP as well if the service is TCP (same semantics as a sidecar)

Hmmm, I'm not seeing it. I'll re-check...

jshaughn commented 4 weeks ago

Hmmm, I'm not seeing it. I'll re-check...

I think due to https://github.com/istio/istio/issues/53593