Support a DNSSRVNOA Option on the loadbalancing Exporter

knechtionscoding commented 1 year ago

Component(s)

exporter/loadbalancing

Is your feature request related to a problem? Please describe.

I discovered when trying to set configure an otel collector to use the loadbalancing to ship to another otel collector deployment that it fails when istio is involved.

Currently DNS requests made for loadbalancing fail to work with ISTIO services because it it doing an A record look up rather than accepting an SRC

When trying to setup a loadbalancing exporter to talk to a k8s headless service it makes a A record look up, that then means it uses the Pod IP address as the host name, and that fails because ISTIO doesn't allow routing via pod IPs.

This is true even if you use the pod hostname:

2023-02-06T22:23:31.510Z    warn    zapgrpc/zapgrpc.go:195  [core] [Channel #9 SubChannel #10] grpc: addrConn.createTransport failed to connect to {
  "Addr": "10.17.57.172:4137",
  "ServerName": "10.17.57.172:4137",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: authentication handshake failed: read tcp 10.17.161.21:48420->10.17.57.172:4137: read: connection reset by peer"   {"grpc_log": true}

Istio proxy logs from the request being made:

{"upstream_host":"10.17.141.119:4137","response_flags":"UF,URX","response_code":0,"istio_policy_status":null,"upstream_cluster":"InboundPassthroughClusterIpv4","requested_server_name":null,"request_id":null,"route_name":null,"orig_path":null,"downstream_local_address":"10.17.141.119:4137","downstream_remote_address":"10.17.161.13:60252","authority":null,"method":null,"cf_ray":null,"x_forwarded_for":null,"bytes_sent":0,"bytes_received":0,"user_agent":null,"upstream_service_time":null,"protocol":null,"b3_parentspan_id":null,"w3c_traceparent":null,"ml_loadtest":null,"upstream_local_address":null,"b3_span_id":null,"path":null,"ml_faulttest":null,"start_time":"2023-02-06T22:00:30.915Z","b3_trace_id":null,"duration":1,"upstream_transport_failure_reason":null,"response_code_details":null}

Example of loadbalancing config:

  loadbalancing:
      routing_key: traceID
      protocol:
        otlp:
          timeout: 1s
          tls:
            insecure: true
      resolver:
        dns:
          hostname: otel-sampler-headless
          port: 4137

It works properly if you configure the receiving otel-collector as a stateful set and then use each pod name, because the SRV record will come back matching.

Describe the solution you'd like

Support a similar option to https://thanos.io/tip/thanos/service-discovery.md/#dns-service-discovery where you can set +dnssrvnoa to allow us to use istio with a deployment of the receiving otel-collector

https://github.com/thanos-io/thanos/commit/432785e77e3cf68af44b0c85bb7d0b68e1ef46ec is the thanos code that does this.

The general ask is that the loadbalancing configuration does the SRV resolve and then from there act as if it is filling in the static section.

Showing the difference:

dig +short _tcp-otlp._tcp.otel-sampler.big-brother.svc.cluster.local
otel-sampler-2.otel-sampler.big-brother.svc.cluster.local.
10.17.135.68
10.17.141.147
10.17.163.65

# Even with a DEPLOYMENT
dig +short _grpc-otlp._tcp.otel-collector-headless.big-brother.svc.cluster.local SRV
10 25 4317 3430636431363132.otel-collector-headless.big-brother.svc.cluster.local.
10 25 4317 3131383736653061.otel-collector-headless.big-brother.svc.cluster.local.
10 25 4317 3936663563343666.otel-collector-headless.big-brother.svc.cluster.local.
10 25 4317 3636643866356166.otel-collector-headless.big-brother.svc.cluster.local.

Notice the SRV part on the end of the second which is what OTEL should accept as an alternative to the A record

Describe alternatives you've considered

Running as a statefulSet works, but doesn't autoscale properly. Right now we have to make do with manually listing out the pods in the stateful set:

      resolver:
        static:
          hostnames:
          - otel-sampler-0.otel-sampler.big-brother.svc.cluster.local:4317
          - otel-sampler-1.otel-sampler.big-brother.svc.cluster.local:4317
          - otel-sampler-2.otel-sampler.big-brother.svc.cluster.local:4317

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 1 year ago

This looks cool, I'll be happy to review a PR adding support for this.

knechtionscoding commented 1 year ago

@jpkrohling If you can point me in the direction of the code that makes the DNS requests in the otel collector I can work on delivering a PR for it.

jpkrohling commented 1 year ago

Sure, here's a short summary:

you'll need a resolver implementing this interface: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/loadbalancingexporter/resolver.go
then you add a new config option for your resolver here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/fac9f8ba05b3af51e4e11e2cb639db3a4b7202d0/exporter/loadbalancingexporter/config.go#L42-L46
and initialize the resolver here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/fac9f8ba05b3af51e4e11e2cb639db3a4b7202d0/exporter/loadbalancingexporter/loadbalancer.go#L70-L86

That's pretty much it. The current DNS resolver (linked below) just uses the TCP stack to resolve the IP for a name, so it's not directly making a DNS call. You'll likely need to use a go library to perform a SRV query. Unfortunately, I can't recommend one, as I haven't used one yet.

For reference, here's the DNS resolver: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/loadbalancingexporter/resolver_dns.go

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

fyuan1316 commented 1 year ago

It looks like the service here is located in a k8s cluster, and the k8s resolver (pr #22776) should be better suited to this scenario

jpkrohling commented 1 year ago

While the Kubernetes service resolver would solve the problem for Kubernetes' deployments, a service record can be used outside of Kubernetes as well. I'll keep this issue open, to gauge interest from other folks.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

snuggie12 commented 1 year ago

It looks like the service here is located in a k8s cluster, and the k8s resolver (pr #22776) should be better suited to this scenario

@fyuan1316 The k8s resolver actually has the same problem. It looks at Endpoints which are Pod IPs. Istio requires a DNS name which is in the service catalog for it to work. If there was a way via kubernetes API to take a headless service and return back the pod-based service names that get created then it could work.

The static resolver might actually be a potential solution as well. If you were to add some sort of "resolve interval" option to it allowing for more than once resolutions that would also work.

snuggie12 commented 11 months ago

@jpkrohling Really appreciate the guidance. I've been working through a PR and don't foresee any issues implementing a resolver, but I am worried it might require changes to loadbalancer.go.

When using an SRV record and your backends are a StatefulSet there are two different changes to monitor:

The number of targets (array of "endpoints") changes. This would be no different from the other resolvers.
The number of targets does not change, but the underlying IPs change. This is the vastly more common situation and is the part I'm concerned about since I believe the hash rings will be equivalent.

Excuse the example if it's unnecessary, but if we have a StatefulSet called otel-sampler with 3 replicas and a headless Service called otel-sampler-headless with a port called lb then we'd have an an SRV record of _lb._tcp.otel-sampler-headless.<namespace>.svc.cluster.local. This would resolve to:

otel-sampler-0.otel-sampler-headless.<namespace>.svc.cluster.local
otel-sampler-1.otel-sampler-headless.<namespace>.svc.cluster.local
otel-sampler-2.otel-sampler-headless.<namespace>.svc.cluster.local When one of these pods are restarted they'll get a new IP so we'll need to send any changed records to be re-resolved by the TCP stack.

Feel free to correct me here, but I think we have two problems with this. First, the hash ring will be the same and so we won't kick off these two methods. Even if we did something like "on IP change, shuffle the endpoints so a new hash is made" (which feels like it could have its own issues,) we'd still run into issues inside of addMissingExporters. It examines each endpoint to figure out what is new and nothing will appear as new.

I don't really have a good idea on how to fix this, but I think conceptually it would take implementing some sort of restart concept. If the hash rings are the same, but a restart flag is present on the ring you still need to proceed. If an endpoint already exists but a restart flag exists on the endpoint then that exporter needs restarted. I think you can achieve this by inserting a method like removeRestartingExporters right before addMissingExporters as this would stop the exporter and make it be listed as "missing" and addMissingExporters would start it back up.

jpkrohling commented 11 months ago

Is this just a matter of forcing a new DNS resolution for those hosts? If so, the new resolver can keep a map of hosts and last seen IPs, and do a DNS lookup (a simple LookupIPAddr should be sufficient) when changes are detected. This is exclusive to the business of this resolver, I don't think we need a more generic solution at the load balancer level.

gravufo commented 11 months ago

I'm not sure, but I think for istio/envoy to work it would just require adding a Host header with the headless service hostname (e.g.: otel-sampler-headless.<namespace>.svc.cluster.local). I believe this would work regardless if it's a deployment or a statefulset. We hit the same issue on our side, but it seems like setting mTLS mode PERMISSIVE made it work with pod-to-pod IP. That said, we do want mTLS, so we will be trying the statefulset workaround...although it doesn't seem to work yet.

snuggie12 commented 11 months ago

Is this just a matter of forcing a new DNS resolution for those hosts? If so, the new resolver can keep a map of hosts and last seen IPs, and do a DNS lookup (a simple LookupIPAddr should be sufficient) when changes are detected. This is exclusive to the business of this resolver, I don't think we need a more generic solution at the load balancer level.

It is a matter of forcing new resolution, however we need to be specific about how the resolution is done. I don't know if I'm using the right terminology here when I say "TCP Stack" or "OS", but here is an example. As @gravufo mentioned, setting the sampler to PERMISSIVE works and that's how we currently get around it so here is an example with prometheus instead:

# Using CNAME for demonstration purposes only. Normally would be an SRV
$ dig '_http-web._tcp.prometheus-operated.monitoring.svc.cluster.local' | grep -A3 'ANSWER SECTION' | tail -n 3
_http-web._tcp.prometheus-operated.monitoring.svc.cluster.local. 30 IN CNAME prometheus-main-0.prometheus-operated.monitoring.svc.cluster.local.
prometheus-main-1.prometheus-operated.monitoring.svc.cluster.local. 30 IN A 10.24.129.68
prometheus-main-0.prometheus-operated.monitoring.svc.cluster.local. 30 IN A 10.24.134.21

# Using a hostname instead of an IP works
$ curl -s prometheus-main-0.prometheus-operated.monitoring.svc.cluster.local:9090/api/v1/status/tsdb | jq -r .status
success

# Using an IP fails
$ curl -s 10.24.134.21:9090/api/v1/status/tsdb
upstream connect error or disconnect/reset before headers. reset reason: connection termination

Hopefully that makes sense. What we want to send to onBackendChanges are the hostnames and not the IPs. However, the hostnames will almost never change.

One solution I thought of which I think avoids changing the LB is by calling callback twice. The first time with the restarting endpoints not provided and the 2nd time as normal.

snuggie12 commented 11 months ago

Here's my branch before writing tests (or testing for that matter.)

jpkrohling commented 11 months ago

I think there are two valid requests here in this ticket at this point:

1) set the Host header on outgoing calls 2) support SRV records

I would treat the first as a bug and should enable mTLS with Istio usage. The second is a feature request, of which I would love to review a PR.

snuggie12 commented 11 months ago

@jpkrohling I'm assuming I am only responsible for #2? I'm hoping to have a PR in the next week or so.

github-actions[bot] commented 9 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 7 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 6 months ago

The linked PR was closed without being merged, I believe it was useful to understand that we need to think about a more complete design for the solution as a whole.

snuggie12 commented 6 months ago

Apologies for the delay @jpkrohling, I haven't been able to get back to the PR and after writing the code we're able to simply run the fork so it's hard to prioritize.

That being said unless I missed something I was waiting for the person who had reviewed it after you to re-review.

I'm okay with keeping it closed and waiting for the larger scoped SRV implementation as long as the feature of only doing one resolution (SRV to A, but not A to IP address,) remains. Do you think that would be acceptable?

jpkrohling commented 6 months ago

Absolutely! I feel like a refactor of the load-balancing exporter is overdue, especially after batching is added to exporters natively. We can use that opportunity to see what we need to change to support this use-case more cleanly.

snuggie12 commented 4 months ago

@jpkrohling I had one other thought on how to fix this and before I go and ask for time in my July or August sprint to tackle this I figured I'd run it by you.

I noticed that headless Endpoints list the hostname in their addresses (at least that's true when it's the specified Service on a StatefulSet.) Rather than trying to make a new resolver, what do you think of just adding an optional config on the k8s resolver which returns back hostnames if the conditions are right?

jpkrohling commented 4 months ago

what do you think of just adding an optional config on the k8s resolver which returns back hostnames if the conditions are right?

That sounds like a good idea, and without double-checking the code, that's what I would intuitively expect this resolver to do.

fyuan1316 commented 4 months ago

To summarize and hopefully help someone who came to this thread via a google search.

Currently (e.g. OTel LB collector + backend collector), if istio is involved (e.g. injecting istio sidecar to backend service and the mtls mode is STRICT), this will cause isito-proxy to interrupt the connection and the OTel backend collector will not be able to establish connections properly.

The istio community already has an issue tracker: https://github.com/istio/istio/issues/37431

The istio community has a long term solution: https://github.com/istio/istio/issues/37431#issuecomment-1412831780

https://istio.io/latest/blog/2022/introducing-ambient-mesh/ will support direct to pod

For service invocation scenarios, short-term workaround solutions: https://github.com/istio/istio/issues/37431#issuecomment-1046205209 (This solution is required for http-like calls, in which case you need to pass the headers)

Additionally, if it is possible to allow the mtls to be set to non-STRICT, this will not cause the above issue.

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

snuggie12 commented 1 month ago

Not sure if a comment is required to be marked not stale, but I submitted a PR which resolves the overall user story.

Using the k8s resolver will work for most, if not all, scenarios. I also agree with the requests made on the SRV solution I worked up. This is a kubernetes solution for a kubernetes problem and will not impact non-k8s environments.

open-telemetry / opentelemetry-collector-contrib