hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 385 forks source link

Critical Error in K8s envoy sidecar log cause the container to fail init #825

Closed drfooser closed 3 years ago

drfooser commented 3 years ago

Can you provide a sample proxyDefaults json cfg that produces zipkin tracing for envoy sidecar in consul-helm?

I am trying to configure Zipkin tracing in envoy proxy in K8s. After deploying test pod into K8s cluster with consul connect set for automatic sidecar injection, the pod errors at init and I get this error in the envoy sidecar container log.

[2021-02-11 20:06:05.149][1][critical][main] [source/server/server.cc:102] error initializing configuration '/consul/connect-inject/envoy-bootstrap.yaml': Protobuf message (type envoy.config.bootstrap.v3.Bootstrap reason INVALID_ARGUMENT:(tracing.http) config: Cannot find field.) has unknown fields.

In my helm chart values override file I have defined 4 json strings in connectInject.centralConfig. values-override-yaml.txt

proxyDefaults: envoy_extra_static_clusters_json, envoy_extra_static_listeners_json, envoy_tracing_json, envoy_prometheus_bind_addr

proxyDefaults: |
  {
    "envoy_extra_static_clusters_json": "{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}",
    "envoy_extra_static_listeners_json": "{\"listeners\":[{\"address\":{\"socket_address\":{\"address\":\"0.0.0.0\",\"port_value\":8000}},\"traffic_direction\":\"INBOUND\",\"filter_chains\":[{\"filters\":[{\"name\":\"envoy.filters.network.http_connection_manager\",\"typed_config\":{\"@type\":\"type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager\",\"tracing\":{\"provider\":{\"name\":\"envoy.tracers.zipkin\",\"typed_config\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.ZipkinConfig\",\"collector_cluster\":\"zipkin\",\"collector_endpoint\":\"/api/v2/spans\",\"collector_endpoint_version\":\"HTTP_JSON\"}}},\"codec_type\":\"auto\",\"stat_prefix\":\"ingress_http\",\"route_config\":{\"name\":\"do-visit-counter-frontend_route\",\"virtual_hosts\":[{\"name\":\"do-visit-counter-frontend\",\"domains\":[\"*\"],\"routes\":[{\"match\":{\"prefix\":\"/\"},\"route\":{\"cluster\":\"local_service\"},\"decorator\":{\"operation\":\"checkAvailability\"}}]}]},\"http_filters\":[{\"name\":\"envoy.filters.http.router\",\"typed_config\":{}}]}}]}]},{\"address\":{\"socket_address\":{\"address\":\"0.0.0.0\",\"port_value\":5000}},\"traffic_direction\":\"OUTBOUND\",\"filter_chains\":[{\"filters\":[{\"name\":\"envoy.filters.network.http_connection_manager\",\"typed_config\":{\"@type\":\"type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager\",\"tracing\":{\"provider\":{\"name\":\"envoy.tracers.zipkin\",\"typed_config\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.ZipkinConfig\",\"collector_cluster\":\"zipkin\",\"collector_endpoint\":\"/api/v2/spans\",\"collector_endpoint_version\":\"HTTP_JSON\"}}},\"codec_type\":\"auto\",\"stat_prefix\":\"egress_http\",\"route_config\":{\"name\":\"do-visit-counter-backend_route\",\"virtual_hosts\":[{\"name\":\"do-visit-counter-backend\",\"domains\":[\"*\"],\"routes\":[{\"match\":{\"prefix\":\"/trace/2\"},\"route\":{\"cluster\":\"do-visit-counter-backend\"},\"decorator\":{\"operation\":\"checkStock\"}}]}]},\"http_filters\":[{\"name\":\"envoy.filters.http.router\",\"typed_config\":{}}]}}]}]}]}",
    "envoy_tracing_json": "{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}",
    "envoy_prometheus_bind_addr": "0.0.0.0:9102"
  }

Consul helm chart 0.29 seems to be using APIv2 configuration in the v3 bootstrap.

thisisnotashwin commented 3 years ago

Hey @drfooser

This does not strictly answer the question. We are trying to figure out what the error in the config might be. On an aside though, we will be deprecating using the central config in favor of using CRDs from the next helm release. We will provide docs regarding migrating from centralConfig to CRDs but it might be worthwhile checking out https://www.consul.io/docs/connect/config-entries/proxy-defaults as to how to configure the proxy default as a CRD.

Will try and figure out the config error in the above in the meanwhile!

thisisnotashwin commented 3 years ago

Can you try this config:

---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    envoy_tracing_json: "{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}"
    envoy_extra_static_clusters_json: "{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}"
drfooser commented 3 years ago

I deployed the proxyDefaults to a fresh K8s cluster and the same consul helm installation. I deployed test pod and validated all containers started - no errors in the envoy sidecar. I see these log entries in the envoy sidecar log: 2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.config.route.v3.RouteConfiguration (previous count 1) [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration (previous count 1) [2021-02-12 18:21:11.625][13][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:660] updating TLS cluster local_app [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_subscription_impl.cc:73] gRPC config for type.googleapis.com/envoy.api.v2.Listener accepted with 1 resources with version 00000001 [2021-02-12 18:21:11.625][13][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1198] membership update for TLS cluster local_app added 1 removed 0 [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.api.v2.Listener (previous count 1)

But I'm still not seeing trace data, so I pulled down the envoy sidecar bootstrap (attached). There is no zipkin cluster or listener in the config dump.

I do appreciate your help and hope you can help me further. Regards, Paul


From: Ashwin Venkatesh notifications@github.com Sent: Friday, February 12, 2021 7:50 AM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

Can you try this config:


apiVersion: consul.hashicorp.com/v1alpha1 kind: ProxyDefaults metadata: name: global spec: config: envoy_tracing_json: "{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}" envoy_extra_static_clusters_json: "{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-778206712, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJA6O6RCOBH2IWH4F3TS6UWZNANCNFSM4XPTTG6Q.

drfooser commented 3 years ago

I should mention that the applied proxydefaults shows as "SYNCED"=false. DAL >k get proxydefaults NAME SYNCED AGE global False 109m

Not sure if the below is relevant, but consul sycCatalog looks to be disabled - this below from the applied helm config values.yaml

syncCatalog:

True if you want to enable the catalog sync. Set to "-" to inherit from

global.enabled.

enabled: false

The name of the Docker image (including any tag) for consul-k8s

to run the sync program.

@type: string

image: null

If true, all valid services in K8S are

synced by default. If false, the service must be annotated

(https://consul.io/docs/k8s/service-sync#sync-enable-disable) properly to sync.

In either case an annotation can override the default.

default: true


From: Paul Dillon Paul.Dillon@interoptechnologies.com Sent: Friday, February 12, 2021 2:01 PM To: hashicorp/consul-helm consul-helm@noreply.github.com; hashicorp/consul-helm reply@reply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

I deployed the proxyDefaults to a fresh K8s cluster and the same consul helm installation. I deployed test pod and validated all containers started - no errors in the envoy sidecar. I see these log entries in the envoy sidecar log: 2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.config.route.v3.RouteConfiguration (previous count 1) [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration (previous count 1) [2021-02-12 18:21:11.625][13][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:660] updating TLS cluster local_app [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_subscription_impl.cc:73] gRPC config for type.googleapis.com/envoy.api.v2.Listener accepted with 1 resources with version 00000001 [2021-02-12 18:21:11.625][13][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1198] membership update for TLS cluster local_app added 1 removed 0 [2021-02-12 18:21:11.625][1][debug][config] [source/common/config/grpc_mux_impl.cc:110] Resuming discovery requests for type.googleapis.com/envoy.api.v2.Listener (previous count 1)

But I'm still not seeing trace data, so I pulled down the envoy sidecar bootstrap (attached). There is no zipkin cluster or listener in the config dump.

I do appreciate your help and hope you can help me further. Regards, Paul


From: Ashwin Venkatesh notifications@github.com Sent: Friday, February 12, 2021 7:50 AM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

Can you try this config:


apiVersion: consul.hashicorp.com/v1alpha1 kind: ProxyDefaults metadata: name: global spec: config: envoy_tracing_json: "{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}" envoy_extra_static_clusters_json: "{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-778206712, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJA6O6RCOBH2IWH4F3TS6UWZNANCNFSM4XPTTG6Q.

lkysow commented 3 years ago

Can you show us the output of kubectl describe proxydefaults global

drfooser commented 3 years ago

Thank you for helping.

$ kubectl describe proxyDefaults global Name: global Namespace: default Labels: Annotations: API Version: consul.hashicorp.com/v1alpha1 Kind: ProxyDefaults Metadata: Creation Timestamp: 2021-02-18T22:37:58Z Finalizers: finalizers.consul.hashicorp.com Generation: 2 Managed Fields: API Version: consul.hashicorp.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"finalizers.consul.hashicorp.com": f:spec: f:expose: f:meshGateway: f:status: .: f:conditions: Manager: consul-k8s Operation: Update Time: 2021-02-18T22:37:58Z API Version: consul.hashicorp.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:config: .: f:envoy_extra_static_clusters_json: f:envoy_tracing_json: Manager: kubectl Operation: Update Time: 2021-02-18T22:37:58Z Resource Version: 1664178 Self Link: /apis/consul.hashicorp.com/v1alpha1/namespaces/default/proxydefaults/global UID: 797051e6-cf11-42be-b59c-4b46c3d8f4cd Spec: Config: envoy_extra_static_clusters_json: {"name":"jaeger_9411","type":"STRICT_DNS","connect_timeout":"1s","dns_lookup_family":"V4_ONLY","hosts":[{"socket_address":{"address":"zipkin","port_value":9411}}],"http2_protocol_options":{}} envoy_tracing_json: { "http":{"name":"envoy.zipkin","config":{"collector_endpoint":"/api/v1/spans","collector_cluster":"jaeger_9411"}}} Expose: Mesh Gateway: Status: Conditions: Last Transition Time: 2021-02-18T22:43:49Z Message: config entry managed in different datacenter: "" Reason: ExternallyManagedConfigError Status: False Type: Synced Events: $


From: Luke Kysow notifications@github.com Sent: Friday, February 12, 2021 3:54 PM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

Can you show us the output of kubectl describe proxydefaults global

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-778475984, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJHYQDFVC4A4YR44XVTS6WPSPANCNFSM4XPTTG6Q.

kschoche commented 3 years ago

hi @drfooser ! Based on this error :

Status:
  Conditions:
    Last Transition Time:  2021-02-18T22:43:49Z
    Message:               config entry managed in different datacenter: ""
    Reason:                ExternallyManagedConfigError

It looks like you'll need to migrate your config entries from Consul to Kubernetes CRDs in order to manage it from Kubernetes. Here is a guide which should walk you through doing so: https://www.consul.io/docs/k8s/crds/upgrade-to-crds#migrating-config-entries

Can you give that a try and let us know how it goes for you?

drfooser commented 3 years ago

Based on the procedure you pointed me to, I applied a single change to the global proxyDefaults in the K8s cluster. I added the annotation to migrate the entry.

This is what I applied with kubectl. apiVersion: consul.hashicorp.com/v1alpha1 kind: ProxyDefaults metadata: name: global annotations: 'consul.hashicorp.com/migrate-entry': 'true' spec: config: envoy_tracing_json: "{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}" envoy_extra_static_clusters_json: "{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}"

It still hasn't synced though: $ k get proxyDefaults global NAME SYNCED AGE global False 19h

With describe I see this error: .... Status: Conditions: Last Transition Time: 2021-02-19T17:40:13Z Message: config entry managed in different datacenter: "" Reason: ExternallyManagedConfigError Status: False Type: Synced Events:

I don't understand how this could be.


From: Kyle Schochenmaier notifications@github.com Sent: Friday, February 19, 2021 9:05 AM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

hi @drfooserhttps://github.com/drfooser ! Based on this error :

Status: Conditions: Last Transition Time: 2021-02-18T22:43:49Z Message: config entry managed in different datacenter: "" Reason: ExternallyManagedConfigError

It looks like you'll need to migrate your config entries from Consul to Kubernetes CRDs in order to manage it from Kubernetes. Here is a guide which should walk you through doing so: https://www.consul.io/docs/k8s/crds/upgrade-to-crds#migrating-config-entries

Can you give that a try and let us know how it goes for you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-782131469, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJCSI3ME53YELWFKBILS7Z43RANCNFSM4XPTTG6Q.

kschoche commented 3 years ago

hi @drfooser, this is strange indeed. Can you confirm which version of consul and consul-k8s you're running? it's possible that we have a mismatch and may need to do things manually to start.

thanks!

drfooser commented 3 years ago

I'm guessing as to how to confirm the answer to your question.

the consul binary on my workstation (used to create TLS keys and such): $ consul -version Consul v1.9.0 Revision a417fe510 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

the version of consul-k8s demanded from the values.yaml file of helm chart consul-0.29.0 global.imageK8S: "hashicorp/consul-k8s:0.23.0"

Please let me know if I'm doing it wrong. 🙂 If I am I'll be happy to rebuild with compatible versions.

Best Regards, Paul


From: Kyle Schochenmaier notifications@github.com Sent: Friday, February 19, 2021 12:55 PM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

hi @drfooserhttps://github.com/drfooser, this is strange indeed. Can you confirm which version of consul and consul-k8s you're running? it's possible that we have a mismatch and may need to do things manually to start.

thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-782273403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJGKL7PMG67O2VK27DTS72X37ANCNFSM4XPTTG6Q.

drfooser commented 3 years ago

I forgot to add the consul binary version present in the server pods: $ kubectl exec consul-server-0 -- consul -version Consul v1.9.2 Revision 6530cf370 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)


From: Paul Dillon Paul.Dillon@interoptechnologies.com Sent: Monday, February 22, 2021 8:39 AM To: hashicorp/consul-helm consul-helm@noreply.github.com; hashicorp/consul-helm reply@reply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

I'm guessing as to how to confirm the answer to your question.

the consul binary on my workstation (used to create TLS keys and such): $ consul -version Consul v1.9.0 Revision a417fe510 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

the version of consul-k8s demanded from the values.yaml file of helm chart consul-0.29.0 global.imageK8S: "hashicorp/consul-k8s:0.23.0"

Please let me know if I'm doing it wrong. 🙂 If I am I'll be happy to rebuild with compatible versions.

Best Regards, Paul


From: Kyle Schochenmaier notifications@github.com Sent: Friday, February 19, 2021 12:55 PM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

hi @drfooserhttps://github.com/drfooser, this is strange indeed. Can you confirm which version of consul and consul-k8s you're running? it's possible that we have a mismatch and may need to do things manually to start.

thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-782273403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJGKL7PMG67O2VK27DTS72X37ANCNFSM4XPTTG6Q.

lkysow commented 3 years ago

Hi, yes you'll need to upgrade to the latest Helm chart version (0.30.0) which includes consul-k8s 0.24.0. See https://www.consul.io/docs/k8s/upgrade#helm-chart-version-upgrade for upgrade instructions. After upgrade, run kubectl describe proxydefaults global again.

drfooser commented 3 years ago

OK maybe I screwed up by not following your directions explicitly. After upgrade I deleted the global proxyDefaults and re-applied before checking the describe.

  1. I upgraded the helm chart to version to consul-0.30.0 with this command: $ helm upgrade consul-override-0.29.v1.0.debug hashicorp/consul --version 0.30.0 --set 'client.extraConfig="{"log_level": "DEBUG"}"' --set 'global.datacenter=fmy-consul-default' --namespace=default -f ./values-override-0.30.v1.0.yaml (see attached values override file)[cid:d8cadb46-2ded-4154-a71d-701ea8fb32c1]

  2. Upgrade executed without any pod crash issues.[cid:e560dd8a-0008-491d-8140-b1ce721bf4c3]

  3. The global proxyDefaults resource still remained not synced. [cid:4388b61d-1865-433c-98c9-624d63c965f1]

  4. So I deleted it and reapplied: [cid:eb778b50-d5c9-4f2c-a323-88c7ed48234b]

  5. Still not synced, and the describe says this: (full output attached) [cid:8ff87bfa-592d-4d1a-976c-cdd95a1cc659]

I'm very thankful for your help.

Regards, Paul


From: Luke Kysow notifications@github.com Sent: Monday, February 22, 2021 11:11 AM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

Hi, yes you'll need to upgrade to the latest Helm chart version (0.30.0) which includes consul-k8s 0.24.0. See https://www.consul.io/docs/k8s/upgrade#helm-chart-version-upgrade for upgrade instructions. After upgrade, run kubectl describe proxydefaults global again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-783527956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJFS6BCHEGKTC4IGNL3TAKF47ANCNFSM4XPTTG6Q.

Name: global Namespace: default Labels: Annotations: consul.hashicorp.com/migrate-entry: true API Version: consul.hashicorp.com/v1alpha1 Kind: ProxyDefaults Metadata: Creation Timestamp: 2021-02-22T21:31:12Z Finalizers: finalizers.consul.hashicorp.com Generation: 2 Managed Fields: API Version: consul.hashicorp.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"finalizers.consul.hashicorp.com": f:spec: f:expose: f:meshGateway: f:status: .: f:conditions: Manager: consul-k8s Operation: Update Time: 2021-02-22T21:31:12Z API Version: consul.hashicorp.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:consul.hashicorp.com/migrate-entry: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:config: .: f:envoy_extra_static_clusters_json: f:envoy_tracing_json: Manager: kubectl Operation: Update Time: 2021-02-22T21:31:12Z Resource Version: 3266593 Self Link: /apis/consul.hashicorp.com/v1alpha1/namespaces/default/proxydefaults/global UID: d1d22978-60c5-4faf-a8aa-c6af41539980 Spec: Config: envoy_extra_static_clusters_json: {"name":"jaeger_9411","type":"STRICT_DNS","connect_timeout":"1s","dns_lookup_family":"V4_ONLY","hosts":[{"socket_address":{"address":"zipkin","port_value":9411}}],"http2_protocol_options":{}} envoy_tracing_json: { "http":{"name":"envoy.zipkin","config":{"collector_endpoint":"/api/v1/spans","collector_cluster":"jaeger_9411"}}} Expose: Mesh Gateway: Status: Conditions: Last Transition Time: 2021-02-22T21:47:34Z Message: migration failed: Kubernetes resource does not match existing Consul config entry: consul={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_prometheus_bind_addr":"0.0.0.0:9102"},"MeshGateway":{},"Expose":{},"CreateIndex":14,"ModifyIndex":14}, kube={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_extra_static_clusters_json":"{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}","envoy_tracing_json":"{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}"},"MeshGateway":{},"Expose":{},"Meta":{"consul.hashicorp.com/source-datacenter":"fmy-consul-default","external-source":"kubernetes"},"CreateIndex":0,"ModifyIndex":0} Reason: MigrationFailedError Status: False Type: Synced Events:

lkysow commented 3 years ago

So we're getting close. The migration to CRD failed:

Message:               migration failed: Kubernetes resource does not match existing Consul config entry: consul={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_prometheus_bind_addr":"0.0.0.0:9102"},"MeshGateway":{},"Expose":{},"CreateIndex":14,"ModifyIndex":14}, kube={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_extra_static_clusters_json":"{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}","envoy_tracing_json":"{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}"},"MeshGateway":{},"Expose":{},"Meta":{"consul.hashicorp.com/source-datacenter":"fmy-consul-default","external-source":"kubernetes"},"CreateIndex":0,"ModifyIndex":0}
    Reason:                MigrationFailedError
    Status:                False

Because the CRD's data is different than what's currently in Consul. So what you'll need to do is first, change the CRD to match what's in Consul and then second, once the sync works, then you're free the change it to your desired configuration.

The CRD that would match (step 1) is:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
  annotations:
    'consul.hashicorp.com/migrate-entry': 'true'
spec:
  config:
    envoy_prometheus_bind_addr: "0.0.0.0:9102"

The other option would be for you to delete the proxy-defaults config entry from Consul (this would work if you're not in a production cluster): kubectl exec consul-server-0 -- consul config delete -kind proxy-defaults -name global)

drfooser commented 3 years ago

OK I took the later approach and deleted the existing proxyDefaults from consul-server-0 and (viola!) it synced. Thank you very much! Though this is still not producing any tracing. I can see the tracing block and the in the sidecar config. I can also see the tracing cluster definition too. I thought maybe it I needed to add a listener, so I added envoy_extra_static_listeners_json that resulted in this below:

Still no joy though. I've attached the envoy cfg file for the test frontend pod hoping you will stick with me long enough to get a trace. I do very much appreciate all you help so far.

Regards, Paul


dillon FMY >k describe proxyDefaults global ... Spec: Config: envoy_extra_static_clusters_json: {"name":"jaeger_9411","type":"STRICT_DNS","connect_timeout":"1s","dns_lookup_family":"V4_ONLY","hosts":[{"socket_address":{"address":"zipkin","port_value":9411}}],"http2_protocol_options":{}} envoy_extra_static_listeners_json: {"listeners":[{"address":{"socket_address":{"address":"0.0.0.0","port_value":8000}},"traffic_direction":"INBOUND","filter_chains":[{"filters":[{"name":"envoy.filters.network.http_connection_manager","typed_config":{"@type":"type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager","tracing":{"provider":{"name":"envoy.tracers.zipkin","typed_config":{"@type":"type.googleapis.com/envoy.config.trace.v3.ZipkinConfig","collector_cluster":"zipkin","collector_endpoint":"/api/v2/spans","collector_endpoint_version":"HTTP_JSON"}}},"codec_type":"auto","stat_prefix":"ingress_http","route_config":{"name":"do-visit-counter-frontend_route","virtual_hosts":[{"name":"do-visit-counter-frontend","domains":[""],"routes":[{"match":{"prefix":"/"},"route":{"cluster":"local_service"},"decorator":{"operation":"checkAvailability"}}]}]},"http_filters":[{"name":"envoy.filters.http.router","typed_config":{}}]}}]}]},{"address":{"socket_address":{"address":"0.0.0.0","port_value":5000}},"traffic_direction":"OUTBOUND","filter_chains":[{"filters":[{"name":"envoy.filters.network.http_connection_manager","typed_config":{"@type":"type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager","tracing":{"provider":{"name":"envoy.tracers.zipkin","typed_config":{"@type":"type.googleapis.com/envoy.config.trace.v3.ZipkinConfig","collector_cluster":"zipkin","collector_endpoint":"/api/v2/spans","collector_endpoint_version":"HTTP_JSON"}}},"codec_type":"auto","stat_prefix":"egress_http","route_config":{"name":"do-visit-counter-backend_route","virtual_hosts":[{"name":"do-visit-counter-backend","domains":[""],"routes":[{"match":{"prefix":"/trace/2"},"route":{"cluster":"do-visit-counter-backend"},"decorator":{"operation":"checkStock"}}]}]},"http_filters":[{"name":"envoy.filters.http.router","typed_config":{}}]}}]}]}]} envoy_tracing_json: { "http":{"name":"envoy.zipkin","config":{"collector_endpoint":"/api/v1/spans","collector_cluster":"jaeger_9411"}}} Expose: Mesh Gateway: Status: Conditions: Last Transition Time: 2021-02-23T15:08:20Z Status: True Type: Synced Events:


From: Luke Kysow notifications@github.com Sent: Monday, February 22, 2021 4:40 PM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

So we're getting close. The migration to CRD failed:

Message: migration failed: Kubernetes resource does not match existing Consul config entry: consul={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_prometheus_bind_addr":"0.0.0.0:9102"},"MeshGateway":{},"Expose":{},"CreateIndex":14,"ModifyIndex":14}, kube={"Kind":"proxy-defaults","Name":"global","Config":{"envoy_extra_static_clusters_json":"{\"name\":\"jaeger_9411\",\"type\":\"STRICT_DNS\",\"connect_timeout\":\"1s\",\"dns_lookup_family\":\"V4_ONLY\",\"hosts\":[{\"socket_address\":{\"address\":\"zipkin\",\"port_value\":9411}}],\"http2_protocol_options\":{}}","envoy_tracing_json":"{ \"http\":{\"name\":\"envoy.zipkin\",\"config\":{\"collector_endpoint\":\"/api/v1/spans\",\"collector_cluster\":\"jaeger_9411\"}}}"},"MeshGateway":{},"Expose":{},"Meta":{"consul.hashicorp.com/source-datacenter":"fmy-consul-default","external-source":"kubernetes"},"CreateIndex":0,"ModifyIndex":0} Reason: MigrationFailedError Status: False

Because the CRD's data is different than what's currently in Consul. So what you'll need to do is first, change the CRD to match what's in Consul and then second, once the sync works, then you're free the change it to your desired configuration.

The CRD that would match (step 1) is:

apiVersion: consul.hashicorp.com/v1alpha1 kind: ProxyDefaults metadata: name: global annotations: 'consul.hashicorp.com/migrate-entry': 'true' spec: config: envoy_prometheus_bind_addr: "0.0.0.0:9102"

The other option would be for you to delete the proxy-defaults config entry from Consul (this would work if you're not in a production cluster): kubectl exec consul-server-0 -- consul config delete -kind proxy-defaults -name global)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-783728270, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJD66QKJDNLPINB64NTTALMPRANCNFSM4XPTTG6Q.

lkysow commented 3 years ago

Hi, I think your static cluster is the issue. I think it should be:

{"name":"jaeger_9411","connect_timeout":"1s","type":"strict_dns","lb_policy":"round_robin","load_assignment":{"cluster_name":"jaeger","endpoints":[{"lb_endpoints":[{"endpoint":{"address":{"socket_address":{"address":"simplest-collector.observability","port_value":9411}}}}]}]}}'
config:
  envoy_extra_static_clusters_json: '{"name":"jaeger_9411","connect_timeout":"1s","type":"strict_dns","lb_policy":"round_robin","load_assignment":{"cluster_name":"jaeger","endpoints":[{"lb_endpoints":[{"endpoint":{"address":{"socket_address":{"address":"zipkin","port_value":9411}}}}]}]}}'

Also please verify zipkin:9411 is a resolvable hostname, e.g you have a kube svc in the same namespace called zipkin with that port.

lkysow commented 3 years ago

I should also add that your app must be forwarding the zipkin headers. If you're using a jaeger-only app it may not be doing that.

Example app I used in my testing:

apiVersion: v1
kind: Pod
metadata:
  name: service-a
  labels:
    app: service-a
  annotations:
    "consul.hashicorp.com/connect-inject": "true"
    "consul.hashicorp.com/connect-service-upstreams": "service-b:8082"
spec:
  containers:
    - name: service-a
      image: nicholasjackson/fake-service:v0.20.0
      ports:
        - containerPort: 8081
          name: http
      env:
        - name: UPSTREAM_URIS
          value: http://localhost:8082
        - name: LISTEN_ADDR
          value: 0.0.0.0:8081
        - name: TRACING_ZIPKIN
          value: http://zipkin:9411
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-service-a
  labels:
    app: service-a
spec:
  ports:
    - port: 8081
      targetPort: 8081
  selector:
    app: service-a
---
apiVersion: v1
kind: Pod
metadata:
  name: service-b
  labels:
    app: service-b
  annotations:
    "consul.hashicorp.com/connect-inject": "true"
spec:
  containers:
    - name: service-b
      image: nicholasjackson/fake-service:v0.20.0
      ports:
        - containerPort: 8082
          name: http
      env:
        - name: LISTEN_ADDR
          value: 0.0.0.0:8082
        - name: TRACING_ZIPKIN
          value: http://zipkin:9411
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-service-b
  labels:
    app: service-b
spec:
  ports:
    - port: 8082
      targetPort: 8082
  selector:
    app: service-b
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  config:
    protocol: http
    envoy_tracing_json: '{"http":{"name":"envoy.zipkin","config":{"collector_cluster":"jaeger","collector_endpoint":"/api/v1/spans","shared_span_context":true}}}'
    envoy_extra_static_clusters_json: '{"name":"jaeger","connect_timeout":"1s","type":"strict_dns","lb_policy":"round_robin","load_assignment":{"cluster_name":"jaeger","endpoints":[{"lb_endpoints":[{"endpoint":{"address":{"socket_address":{"address":"zipkin","port_value":9411}}}}]}]}}'
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: service-a
spec:
  protocol: http
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: service-b
spec:
  protocol: http
drfooser commented 3 years ago

Luke, Thank you for helping me! This last email seems to address my trouble. I'm still not 100% certain but I should tell you that I applied the data from both your emails from yesterday and the result is the service-a test pod sends traces, my existing test pods do not. This indicates that the most likely problem is the test pods I have been using are not instrumented properly.

I apologize for showing my frustration. I can only say that I try not to ask for support unless I have first exhausted all known avenues - while that prevents me from wasting your time, this time around I got frustrated.

Thanks again for your help. Paul


From: Luke Kysow notifications@github.com Sent: Wednesday, March 3, 2021 3:17 PM To: hashicorp/consul-helm consul-helm@noreply.github.com Cc: Paul Dillon Paul.Dillon@interoptechnologies.com; Mention mention@noreply.github.com Subject: Re: [hashicorp/consul-helm] Critical Error in K8s envoy sidecar log cause the container to fail init (#825)

I should also add that your app must be forwarding the zipkin headers. If you're using a jaeger-only app it may not be doing that.

Example app I used in my testing:

apiVersion: v1 kind: Pod metadata: name: service-a labels: app: service-a annotations: "consul.hashicorp.com/connect-inject": "true" "consul.hashicorp.com/connect-service-upstreams": "service-b:8082" spec: containers:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/consul-helm/issues/825#issuecomment-790060999, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMHZJBKJYND3PCT4DPQMY3TB2RPHANCNFSM4XPTTG6Q.

lkysow commented 3 years ago

Hi Paul no worries!

This 100% needs better documentation from our side so again we apologize for the difficulties. I also ran into the same issue using the jaeger tracing library. I haven't had time to dig into it but I think that maybe that library uses different headers than the envoy zipkin integration is expecting.

drfooser commented 3 years ago

I have pods generating traces now and they are getting delivered to the jaeger collector.
Thank you for your helm @lkysow ! I think this issue can be closed.

drfooser commented 3 years ago

Thanks again @lkysow ! You helped a ton.