stripe / veneur

A distributed, fault-tolerant pipeline for observability data
MIT License
1.74k stars 174 forks source link

veneur-proxy doesn't work on Kubernetes via GRPC #762

Open rafaelgaspar opened 5 years ago

rafaelgaspar commented 5 years ago

I've been trying to use veneur-proxy on our setup that only has one veneur-global and a bunch of veneur-local's running as sidecars on each pod that needs to send metrics to Datadog.

The setup with just one veneur-global works fine, but after fiddling around with the proxy, the farthest I've got was:

time="2019-11-11T15:29:14Z" level=debug msg="Found TCP port" port=8128
time="2019-11-11T15:29:14Z" level=debug msg="Got destinations" destinations="[http://172.21.153.68:8128]" service=veneur-global
time="2019-11-11T15:29:41Z" level=error msg="Proxying failed" duration=2.684632ms error="failed to forward to the host 'http://172.21.153.68:8128' (cause=forward, metrics=1): failed to send 1 metrics over gRPC: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: address http://172.21.153.68:8128: too many colons in address\"" protocol=grpc
time="2019-11-11T15:29:44Z" level=debug msg="About to refresh destinations" acceptingForwards=false consulForwardGRPCService=veneur-global consulForwardService= consulTraceService=

That error is similar to when the forward GRPC address is added with the http:// prefix and looking at the code it seams that the "http://" is hardcoded added by the KubernetesDiscoverer while it's not added by Consul.

What is the reasoning for that? I don't want to fallback to sending this over HTTP just because the discoverer can't handle this.

Thanks for your time.

bshelton229 commented 4 years ago

We ran into this a while ago and ended up doing a quick patch to unblock ourselves - https://github.com/roverdotcom/veneur/compare/master...roverdotcom:v13.1.0-rover

When I started to look into doing something that could possibly be merged upstream, I realized there probably needs to be some discussion about the possibility of adding some kind of pluggable discovery pattern and allowing real kubernetes_* configuration for this particular discovery rather than detecting being in kubernetes and using the consul configuration. I then got busy for a couple years, but just ran across this today :). We'd love to help think about how to get the kubernetes discovery uplifted a bit and merged in if possible so we could get back upstream as well.

Our quick patch just looks for the service name to contain the substring grpc, and if it does, it looks for a port definition on the pod named exactly grpc and configures the returned ip list accordingly, without the leading http://. This has worked for us, but is obviously continuing the quick and dirty way of getting k8s discovery to work.

singron commented 4 years ago

FYI if you are running veneur-proxy in kubernetes with grpc, you might also run into #788