Automatic mTLS not working in our GKE Private Cluster setup

part-time-githubber commented 4 years ago

Bug Report

What is the issue?

linkerd is injected, but when we check the comms with edges and tap, the comms are not encrypted

How can it be reproduced?

CNI install - linkerd install-cni --dest-cni-bin-dir /home/kubernetes/bin --dest-cni-net-dir /etc/cni/net.d --cni-log-level debug --proxy-uid 65535 | kubectl apply -f - Linkerd install - linkerd install --linkerd-cni-enabled --proxy-log-level=warn,linkerd=info,linkerd2_proxy=debug --proxy-uid=65535 | kubectl apply -f - Open firewall from master to nodes as per https://linkerd.io/2/reference/cluster-configuration/#private-clusters Install emojivoto app as per https://linkerd.io/2/getting-started/#step-5-install-the-demo-app

Logs, error output, etc

(If the output is long, please create a gist and paste the link here.)

Will attach: Screenshots of our cluster and node pool configuration Linkerd and Emojivoto startup logs Linkerd identity and Linkerd proxy logs from web deployment Tap for web and emoji with -o json Edges for emojivoto namespace

`linkerd check` output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-cni-plugin
------------------
√ cni plugin ConfigMap exists
√ cni plugin PodSecurityPolicy exists
√ cni plugin ClusterRole exists
√ cni plugin ClusterRoleBinding exists
√ cni plugin Role exists
√ cni plugin RoleBinding exists
√ cni plugin ServiceAccount exists
√ cni plugin DaemonSet exists
√ cni plugin pod is running on all nodes

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-addons
--------------
√ 'linkerd-config-addons' config map exists

linkerd-grafana
---------------
√ grafana add-on service account exists
√ grafana add-on config map exists
√ grafana pod is running

Status check results are √

Environment

Kubernetes Version: 1.16.13-gke.1
Cluster Environment: (GKE, AKS, kops, ...) GKE
Host OS: Container OS
Linkerd version: gcr.io/linkerd-io/controller:stable-2.8.1

Possible solution

Additional context

part-time-githubber commented 4 years ago

edges.log emojivoto_startup.log linkerd_check.log linkerd_identity.log linkerd_startup.log tap_emoji_json.log tap_web_json.log web_linkerd_proxy_debug_filtered.log web_linkerd_proxy.log

part-time-githubber commented 4 years ago

Summary of conversation with charles on Slack around this

linkerd edges po -o wide -n emojivoto missing the app pod to pod links
taps not showing any outbound traffic
taps not showing l5d-dst-canonical header for inbound traffic (but seen in the control plane)

part-time-githubber commented 4 years ago

cluster_config nodepool_config

part-time-githubber commented 4 years ago

Network Policies config is set at

linkerd and linkerd-cni namespaces are reachable from any namespace linkerd and emojivoto namespace can talk to each other

grampelberg commented 4 years ago

Your tap does not show any OUTBOUND for web. It sounds like you've got a half-configured CNI setup from the logs to me. I'd recommend checking the CNI logs to see what's going on there and inspecting the iptables rules to understand why you're only getting half the redirects. Skimming through your GKE cluster configuration, there might be an issue with shielded nodes, I've not done anything with those before.

part-time-githubber commented 4 years ago

Hello Thomas,

Many thanks for a prompt follow-up.

I recycled the web pod and captured the systemd logs from the node it got scheduled on. PFA. I will look into, once my day begins. But I am sure it will take you less time compared to me to decipher what is going on :-)

I would doubt if Shielded VMs would matter, as Calico network policy would also work with iptables and runs fine.

cheers, pankaj

5vh5_systemd.log

part-time-githubber commented 4 years ago

Confirming. when you mention CNI logs @grampelberg, those are the ones from systemd on the node, right?

cpretzer commented 4 years ago

@pankajmt the systemd logs show us the iptables rules that are being created. There is also some log output from the pods in the linkerd-cni namespace that would be good to look at, here is the output from my local environment:

Wrote linkerd CNI binaries to /host/opt/cni/bin
Using CNI config template from CNI_NETWORK_CONFIG environment variable.
      "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
      "k8s_api_root": "https://10.96.0.1:__KUBERNETES_SERVICE_PORT__",
CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "debug",
  "policy": {
      "type": "k8s",
      "k8s_api_root": "https://10.96.0.1:443",
      "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
  },
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4190","4191"],
    "outbound-ports-to-ignore": [],
    "simulate": false,
    "use-wait-flag": false
  }
}
Created CNI config /host/etc/cni/net.d/10-kindnet.conflist
Done configuring CNI. Sleep=true

part-time-githubber commented 4 years ago

Here you go @cpretzer. As that has just some one time information on startup, I thought it would not be of interest, when having recycled the web pod. linkerd_cni_pods.log Thats for the 2 nodes I have in the cluster. kl is an alias for kubectl logs.

cpretzer commented 4 years ago

Thanks for sending the CNI output, it's useful to compare the two between a working environment and the private GKE cluster where you see this behavior.

Just as a sanity check, I got the iptables output from an installation that doesn't use CNI and compared it to the iptables output generated by the CNI plugin. The two are identical, and you can find the output in this gist.

At this point, it makes sense to debug this as a GKE networking issue and take a close look at the actual iptables settings on the nodes where the web and emoji services are running. That means that you'll want to ssh to the nodes and run the iptables commands to confirm that they allow the traffic to go through the proxy in both directions.

It may be worth looking into the calico output/configuration as well.

part-time-githubber commented 4 years ago

I am limited versed with iptables. It would help if you tell me the iptables commands to run other than iptables -L?

cpretzer commented 4 years ago

@pankajmt Try using iptables-save to see the entire list.

part-time-githubber commented 4 years ago

PFA iptables-save logs.

When this was taken, (we have a preemptibles node setup, so nodes get recycled)

web was on node ending with 56r2 emoji was no node ending with 5vh5

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES emoji-5d58fcfdd6-8xfh2 2/2 Running 0 10m 192.168.14.32 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> vote-bot-8fbbd4d95-z9vbn 2/2 Running 0 10m 192.168.14.33 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> voting-58cdf886bf-tvwlx 2/2 Running 0 10m 192.168.14.34 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> web-7db99d98c7-lrvx9 2/2 Running 0 10m 192.168.14.78 gke-gke-poc-3-gke-poc-3-np-0d41ad04-56r2 <none> <none>

5vh5-iptables-save.log 56r2-iptables-save.log

part-time-githubber commented 4 years ago

cluster_description.txt nodepool_description.txt

outputs from gcloud container clusters describe gke-poc-3 --zone australia-southeast1-a gcloud container node-pools describe gke-poc-3-np --zone australia-southeast1-a --cluster gke-poc-3

a diff on your side with your cluster/nodepool output should help call out the configuration differences i think

cpretzer commented 4 years ago

@pankajmt I looked through the describe output and there are a few differences but nothing that I would expect to cause this behavior.

Can you send me the output from these two commands:

gcloud beta compute networks describe poc-net --zone=australia-southeast1-a gcloud beta compute networks subnets describe poc-gke-3 --zone=australia-southeast1-a

part-time-githubber commented 4 years ago

@cpretzer Guess what, I just deleted the cluster yesterday as it was not in use for long as I am busy with other things and did not have the right security controls :-)

But I have the code for it

resource "google_compute_network" "vpc_network" { project = var.project name = local.net_name description = "POC VPC" routing_mode = "GLOBAL" auto_create_subnetworks = false delete_default_routes_on_create = false }

resource "google_compute_subnetwork" "poc_gke3_subnet" { project = var.project region = local.default_region network = google_compute_network.vpc_network.self_link name = "poc-gke-3" description = "Hosts POC GKE Workloads" private_ip_google_access = true ip_cidr_range = "192.168.15.192/28" secondary_ip_range { range_name = "kube-services" ip_cidr_range = "192.168.15.0/25" } secondary_ip_range { range_name = "kube-pods" ip_cidr_range = "192.168.14.0/24" } log_config { aggregation_interval = "INTERVAL_15_MIN" flow_sampling = 1.0 metadata = "INCLUDE_ALL_METADATA" } }

Hopefully that gives enough you need.

cpretzer commented 4 years ago

Thanks @pankajmt this output looks a bit different than the output that I have for the networks associated with my private cluster.

I've sent the cluster, nodepool, network, and subnetwork descriptions as YAML to your email

part-time-githubber commented 4 years ago

@cpretzer changing routing more for network from GLOBAL TO REGIONAL did not help.

btw have you compared your CNI and LINKERD (with cni) install yaml with mine?

part-time-githubber commented 4 years ago

other network and subnetwork differences should be harmless

we add our own subnets, dont use google default which are not recommended anyways autoCreateSubnetworks: false x_gcloud_subnet_mode: CUSTOM
we enable flow logs enableFlowLogs: true logConfig: aggregationInterval: INTERVAL_15_MIN enable: true flowSampling: 1.0 metadata: INCLUDE_ALL_METADATA

cpretzer commented 4 years ago

@pankajmt I don't see the YAML attached to this ticket. Did you send them as a file through slack or email?

Anything network related is a possible cause for this. When you say that you add your own subnets, are you following this doc?

What is the output from kubectl api-resources?

part-time-githubber commented 4 years ago

YAML sent on email after your message!

Output if kubectl api-resources reads

NAME                               SHORTNAMES      APIGROUP                       NAMESPACED   KIND
bindings                                                                          true         Binding
componentstatuses                  cs                                             false        ComponentStatus
configmaps                         cm                                             true         ConfigMap
endpoints                          ep                                             true         Endpoints
events                             ev                                             true         Event
limitranges                        limits                                         true         LimitRange
namespaces                         ns                                             false        Namespace
nodes                              no                                             false        Node
persistentvolumeclaims             pvc                                            true         PersistentVolumeClaim
persistentvolumes                  pv                                             false        PersistentVolume
pods                               po                                             true         Pod
podtemplates                                                                      true         PodTemplate
replicationcontrollers             rc                                             true         ReplicationController
resourcequotas                     quota                                          true         ResourceQuota
secrets                                                                           true         Secret
serviceaccounts                    sa                                             true         ServiceAccount
services                           svc                                            true         Service
mutatingwebhookconfigurations                      admissionregistration.k8s.io   false        MutatingWebhookConfiguration
validatingwebhookconfigurations                    admissionregistration.k8s.io   false        ValidatingWebhookConfiguration
customresourcedefinitions          crd,crds        apiextensions.k8s.io           false        CustomResourceDefinition
apiservices                                        apiregistration.k8s.io         false        APIService
controllerrevisions                                apps                           true         ControllerRevision
daemonsets                         ds              apps                           true         DaemonSet
deployments                        deploy          apps                           true         Deployment
replicasets                        rs              apps                           true         ReplicaSet
statefulsets                       sts             apps                           true         StatefulSet
tokenreviews                                       authentication.k8s.io          false        TokenReview
localsubjectaccessreviews                          authorization.k8s.io           true         LocalSubjectAccessReview
selfsubjectaccessreviews                           authorization.k8s.io           false        SelfSubjectAccessReview
selfsubjectrulesreviews                            authorization.k8s.io           false        SelfSubjectRulesReview
subjectaccessreviews                               authorization.k8s.io           false        SubjectAccessReview
horizontalpodautoscalers           hpa             autoscaling                    true         HorizontalPodAutoscaler
verticalpodautoscalercheckpoints   vpacheckpoint   autoscaling.k8s.io             true         VerticalPodAutoscalerCheckpoint
verticalpodautoscalers             vpa             autoscaling.k8s.io             true         VerticalPodAutoscaler
cronjobs                           cj              batch                          true         CronJob
jobs                                               batch                          true         Job
certificatesigningrequests         csr             certificates.k8s.io            false        CertificateSigningRequest
leases                                             coordination.k8s.io            true         Lease
bgpconfigurations                                  crd.projectcalico.org          false        BGPConfiguration
bgppeers                                           crd.projectcalico.org          false        BGPPeer
blockaffinities                                    crd.projectcalico.org          false        BlockAffinity
clusterinformations                                crd.projectcalico.org          false        ClusterInformation
felixconfigurations                                crd.projectcalico.org          false        FelixConfiguration
globalbgpconfigs                                   crd.projectcalico.org          false        GlobalBGPConfig
globalfelixconfigs                                 crd.projectcalico.org          false        GlobalFelixConfig
globalnetworkpolicies                              crd.projectcalico.org          false        GlobalNetworkPolicy
globalnetworksets                                  crd.projectcalico.org          false        GlobalNetworkSet
hostendpoints                                      crd.projectcalico.org          false        HostEndpoint
ipamblocks                                         crd.projectcalico.org          false        IPAMBlock
ipamconfigs                                        crd.projectcalico.org          false        IPAMConfig
ipamhandles                                        crd.projectcalico.org          false        IPAMHandle
ippools                                            crd.projectcalico.org          false        IPPool
networkpolicies                                    crd.projectcalico.org          true         NetworkPolicy
networksets                                        crd.projectcalico.org          true         NetworkSet
ingresses                          ing             extensions                     true         Ingress
capacityrequests                   capreq          internal.autoscaling.k8s.io    true         CapacityRequest
serviceprofiles                    sp              linkerd.io                     true         ServiceProfile
nodes                                              metrics.k8s.io                 false        NodeMetrics
pods                                               metrics.k8s.io                 true         PodMetrics
storagestates                                      migration.k8s.io               false        StorageState
storageversionmigrations                           migration.k8s.io               false        StorageVersionMigration
managedcertificates                mcrt            networking.gke.io              true         ManagedCertificate
ingresses                          ing             networking.k8s.io              true         Ingress
networkpolicies                    netpol          networking.k8s.io              true         NetworkPolicy
runtimeclasses                                     node.k8s.io                    false        RuntimeClass
updateinfos                        updinf          nodemanagement.gke.io          true         UpdateInfo
poddisruptionbudgets               pdb             policy                         true         PodDisruptionBudget
podsecuritypolicies                psp             policy                         false        PodSecurityPolicy
clusterrolebindings                                rbac.authorization.k8s.io      false        ClusterRoleBinding
clusterroles                                       rbac.authorization.k8s.io      false        ClusterRole
rolebindings                                       rbac.authorization.k8s.io      true         RoleBinding
roles                                              rbac.authorization.k8s.io      true         Role
scalingpolicies                                    scalingpolicy.kope.io          true         ScalingPolicy
priorityclasses                    pc              scheduling.k8s.io              false        PriorityClass
trafficsplits                      ts              split.smi-spec.io              true         TrafficSplit
csidrivers                                         storage.k8s.io                 false        CSIDriver
csinodes                                           storage.k8s.io                 false        CSINode
storageclasses                     sc              storage.k8s.io                 false        StorageClass
volumeattachments                                  storage.k8s.io                 false        VolumeAttachment
cronjobs                           cj              tap.linkerd.io                 true         Tap
daemonsets                         ds              tap.linkerd.io                 true         Tap
deployments                        deploy          tap.linkerd.io                 true         Tap
jobs                                               tap.linkerd.io                 true         Tap
namespaces                         ns              tap.linkerd.io                 false        Tap
pods                               po              tap.linkerd.io                 true         Tap
replicasets                        rs              tap.linkerd.io                 true         Tap
replicationcontrollers             rc              tap.linkerd.io                 true         Tap
services                           svc             tap.linkerd.io                 true         Tap
statefulsets                       sts             tap.linkerd.io                 true         Tap

Finally, One of the other differences I spotted is we use a custom service account with restricted permissions. I gave our custom service account completed roles/editor permissions too, which your service account would have but that did not help.

part-time-githubber commented 4 years ago

We managed to solve this with help from Buoyant and Charles. We were using UID 65535 for the proxy, which was not to be used in old days when UIDs were 16 bits. It is best avoided, as in this case, it caused some issue using it on Container OS.

We are now using UID 70000, which is allowed by our application PSPs.

cpretzer commented 4 years ago

@pankajmt thanks for the update on GitHub. This was a tough, but fun issue to track down. Really appreciate your patience and great communication during the process. 😄

linkerd / linkerd2