Open Gufran opened 4 years ago
I'm having a similar problem with tracing and opened an issue a couple of days back. Glad you probably found the cause! Hope this will get fixed as both tracing and access_logs are important for large volume transactions. Not sure why access_log is set to /dev/null and we've to resort to injecting a whole new filter chain using envoy_public_listener_json. I was hoping we can at least get tracing to work and perhaps what you found can be addressed soon.
We're running the patched consul binary in our staging environment and I can confirm that removing the random_sampling directive helped. I don't know the full impact of it in other areas otherwise I would've proposed the PR.
Consul team, any ideas on this one? Thanks!
I guess not many folks are using tracing at this point. Haven't heard any suggestions yet on this one.
Just curious to know how you have registered the gateway into Consul! Was it through the following command?:
consul connect envoy -gateway=ingress -register -service ingress-service -address '{{ GetInterfaceIP "eth0" }}:8888'
... like how it's mentioned in this tutorial because I see the approach you have is different. Could you comment on that?
I'm asking because I also have been trying to get tracing to work, but i'm adding the tracing config in the proxy-defaults which seems a bit hackish, but atleast that seems to work how I want it!
@dsouzajude - yes, using the command you mentioned. Will be great to find out what you've added in proxy-defaults so we can check. Does Jaeger show tracing data from the proxy? Thanks
Hi @pvyaka01, at the moment ingresses will not initiate a trace, but will propagate headers if they are received from a downstream caller. This was an intentional decision, hence the comment in consul/blob/v1.8.0/agent/xds/listeners.go.
Don't trace any requests by default unless the client application explicitly propagates trace headers that indicate this should be sampled.
This would explain the behavior you've described in #8503. I've marked this issue an enhancement / feature request to make this parameter configurable so that proxies can be configured to initiate a trace.
Ok, thank you!
@pvyaka01 Here is a look at my proxy-defaults config. Note that envoy_tracing_json
field enables tracing on the default level and for now, this is just a "hack" until it's possible to enable tracing specifically on the Ingress Gateway level. With this config, somehow tracing got enabled on the Ingress Gateway and it shows up as an object in my trace.
On the client, i curl the ingress gateway without passing any trace header information. Since i'm using AWS X-Ray for tracing, i think AWS X-Ray adds the trace headers if it's not present but i'm not a 100% sure.
Hi @pvyaka01, at the moment ingresses will not initiate a trace, but will propagate headers if they are received from a downstream caller. This was an intentional decision, hence the comment in consul/blob/v1.8.0/agent/xds/listeners.go.
Don't trace any requests by default unless the client application explicitly propagates trace headers that indicate this should be sampled.
This would explain the behavior you've described in #8503. I've marked this issue an enhancement / feature request to make this parameter configurable so that proxies can be configured to initiate a trace.
I've managed to put together a draft PR in #8714, and I just want to make sure I'm not stepping on anyone's toes before putting more work into it. @blake is it possible for you to disclose any progress you guys have made on it internally? If nobody else is working on it then I'd be happy to continue my work on #8714 with some initial design review.
Hi @Gufran, our team has not yet started working on this so we appreciate you contributing a PR. We will try to have someone review it soon. Thanks again.
Any updates on this request pls?
@pvyaka01 I have some changes in #8714 to address this. That PR is waiting on a design review right now.
@dsouzajude - not sure which version of Consul you're using. I am using "Consul v1.9.0-beta2". For the life of me, I cannot get envoy_public_listener_json to work for ingress gateway. Perhaps i'm not doing something right. I took your proxy-defaults as-is and started up ingress, no luck.
I can see tracing and extra_static_clusters show up in envoy config, only the public_listener is not.
Ingress Gateway listener requires use of Envoy's
x-client-trace-id
header to initiate the trace. Without this header the requests are not traced at all.Reproduction Steps
Create a new ingress gateway with tracing configuration. This example uses datadog tracer:
Ingress Gateway Service Config
``` service { name = "igw" port = 9999 kind = "ingress-gateway" proxy { config { envoy_dogstatsd_url = "udp://127.0.0.1:8125" envoy_tracing_json = <<-EOF { "http": { "name": "envoy.tracers.datadog", "config": { "collector_cluster": "datadog_trace_collector", "service_name": "igw" } } } EOF envoy_extra_static_clusters_json = <<-EOF { "name": "datadog_trace_collector", "type": "STATIC", "connect_timeout": "1s", "upstream_connection_options": { "tcp_keepalive": {} }, "load_assignment": { "cluster_name": "datadog_trace_collector", "endpoints": [ { "lb_endpoints": [ { "endpoint": { "address": { "socket_address": { "address": "127.0.0.1", "port_value": 8126 } } } } ] } ] } } EOF } } } ```deploy a connect enabled service with similar tracing configuration and perform HTTP requests to initiate tracing.
Consul info for both Client and Server
Client info
``` agent: check_monitors = 0 check_ttls = 0 checks = 5 services = 3 build: prerelease = revision = 3111cb8c version = 1.8.0 consul: acl = disabled known_servers = 3 server = false runtime: arch = amd64 cpu_count = 2 goroutines = 24350 max_procs = 2 os = linux version = go1.14.4 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 49 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 3175 members = 34 query_queue = 0 query_time = 1 ```Server info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 3111cb8c version = 1.8.0 consul: acl = disabled bootstrap = false known_datacenters = 1 leader = false leader_addr = 10.101.3.188:8300 server = true raft: applied_index = 708336420 commit_index = 708336420 fsm_pending = 0 last_contact = 57.987616ms last_log_index = 708336420 last_log_term = 95 last_snapshot_index = 708323646 last_snapshot_term = 95 latest_configuration = [{Suffrage:Voter ID:04b87f04-ce07-0976-b4de-b29a3613b21a Address:10.101.4.10:8300} {Suffrage:Voter ID:daa06ae3-ea15-6b9e-791c-da38a2b66572 Address:10.101.3.188:8300} {Suffrage:Voter ID:60d8cc9c-b651-5e4f-1425-cfce296456e9 Address:10.101.4.50:8300}] latest_configuration_index = 0 num_peers = 2 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 95 runtime: arch = amd64 cpu_count = 2 goroutines = 6536 max_procs = 2 os = linux version = go1.14.4 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 49 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 3175 members = 34 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 211 members = 3 query_queue = 0 query_time = 1 ```Operating system and Environment details
AmazonLinux 2
Log Fragments
None
This problem seems to be at https://github.com/hashicorp/consul/blob/v1.8.0/agent/xds/listeners.go#L933 which is suppressing traces at listener level. I tried to build a binary without this particular
RandomSampling
configuration and tracing started working again.It would be helpful to control sampling using configuration options. We like to initiate tracing with every request that lands on the internet facing listener and then let the destination service decide if tracing should continue.