Closed part-time-githubber closed 4 years ago
Summary of conversation with charles on Slack around this
linkerd edges po -o wide -n emojivoto
missing the app pod to pod links
Network Policies config is set at
linkerd and linkerd-cni namespaces are reachable from any namespace linkerd and emojivoto namespace can talk to each other
Your tap
does not show any OUTBOUND
for web
. It sounds like you've got a half-configured CNI setup from the logs to me. I'd recommend checking the CNI logs to see what's going on there and inspecting the iptables
rules to understand why you're only getting half the redirects. Skimming through your GKE cluster configuration, there might be an issue with shielded nodes, I've not done anything with those before.
Hello Thomas,
Many thanks for a prompt follow-up.
I recycled the web pod and captured the systemd logs from the node it got scheduled on. PFA. I will look into, once my day begins. But I am sure it will take you less time compared to me to decipher what is going on :-)
I would doubt if Shielded VMs would matter, as Calico network policy would also work with iptables and runs fine.
cheers, pankaj
Confirming. when you mention CNI logs @grampelberg, those are the ones from systemd on the node, right?
@pankajmt the systemd logs show us the iptables rules that are being created. There is also some log output from the pods in the linkerd-cni
namespace that would be good to look at, here is the output from my local environment:
Wrote linkerd CNI binaries to /host/opt/cni/bin
Using CNI config template from CNI_NETWORK_CONFIG environment variable.
"k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
"k8s_api_root": "https://10.96.0.1:__KUBERNETES_SERVICE_PORT__",
CNI config: {
"name": "linkerd-cni",
"type": "linkerd-cni",
"log_level": "debug",
"policy": {
"type": "k8s",
"k8s_api_root": "https://10.96.0.1:443",
"k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
},
"linkerd": {
"incoming-proxy-port": 4143,
"outgoing-proxy-port": 4140,
"proxy-uid": 2102,
"ports-to-redirect": [],
"inbound-ports-to-ignore": ["4190","4191"],
"outbound-ports-to-ignore": [],
"simulate": false,
"use-wait-flag": false
}
}
Created CNI config /host/etc/cni/net.d/10-kindnet.conflist
Done configuring CNI. Sleep=true
Here you go @cpretzer. As that has just some one time information on startup, I thought it would not be of interest, when having recycled the web pod. linkerd_cni_pods.log Thats for the 2 nodes I have in the cluster. kl is an alias for kubectl logs.
Thanks for sending the CNI output, it's useful to compare the two between a working environment and the private GKE cluster where you see this behavior.
Just as a sanity check, I got the iptables output from an installation that doesn't use CNI and compared it to the iptables output generated by the CNI plugin. The two are identical, and you can find the output in this gist.
At this point, it makes sense to debug this as a GKE networking issue and take a close look at the actual iptables settings on the nodes where the web and emoji services are running. That means that you'll want to ssh to the nodes and run the iptables commands to confirm that they allow the traffic to go through the proxy in both directions.
It may be worth looking into the calico output/configuration as well.
I am limited versed with iptables. It would help if you tell me the iptables commands to run other than iptables -L?
@pankajmt Try using iptables-save
to see the entire list.
PFA iptables-save logs.
When this was taken, (we have a preemptibles node setup, so nodes get recycled)
web was on node ending with 56r2 emoji was no node ending with 5vh5
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES emoji-5d58fcfdd6-8xfh2 2/2 Running 0 10m 192.168.14.32 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> vote-bot-8fbbd4d95-z9vbn 2/2 Running 0 10m 192.168.14.33 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> voting-58cdf886bf-tvwlx 2/2 Running 0 10m 192.168.14.34 gke-gke-poc-3-gke-poc-3-np-0d41ad04-5vh5 <none> <none> web-7db99d98c7-lrvx9 2/2 Running 0 10m 192.168.14.78 gke-gke-poc-3-gke-poc-3-np-0d41ad04-56r2 <none> <none>
cluster_description.txt nodepool_description.txt
outputs from gcloud container clusters describe gke-poc-3 --zone australia-southeast1-a gcloud container node-pools describe gke-poc-3-np --zone australia-southeast1-a --cluster gke-poc-3
a diff on your side with your cluster/nodepool output should help call out the configuration differences i think
@pankajmt I looked through the describe output and there are a few differences but nothing that I would expect to cause this behavior.
Can you send me the output from these two commands:
gcloud beta compute networks describe poc-net --zone=australia-southeast1-a
gcloud beta compute networks subnets describe poc-gke-3 --zone=australia-southeast1-a
@cpretzer Guess what, I just deleted the cluster yesterday as it was not in use for long as I am busy with other things and did not have the right security controls :-)
But I have the code for it
resource "google_compute_network" "vpc_network" { project = var.project name = local.net_name description = "POC VPC" routing_mode = "GLOBAL" auto_create_subnetworks = false delete_default_routes_on_create = false }
resource "google_compute_subnetwork" "poc_gke3_subnet" { project = var.project region = local.default_region network = google_compute_network.vpc_network.self_link name = "poc-gke-3" description = "Hosts POC GKE Workloads" private_ip_google_access = true ip_cidr_range = "192.168.15.192/28" secondary_ip_range { range_name = "kube-services" ip_cidr_range = "192.168.15.0/25" } secondary_ip_range { range_name = "kube-pods" ip_cidr_range = "192.168.14.0/24" } log_config { aggregation_interval = "INTERVAL_15_MIN" flow_sampling = 1.0 metadata = "INCLUDE_ALL_METADATA" } }
Hopefully that gives enough you need.
Thanks @pankajmt this output looks a bit different than the output that I have for the networks associated with my private cluster.
I've sent the cluster, nodepool, network, and subnetwork descriptions as YAML to your email
@cpretzer changing routing more for network from GLOBAL TO REGIONAL did not help.
btw have you compared your CNI and LINKERD (with cni) install yaml with mine?
other network and subnetwork differences should be harmless
we add our own subnets, dont use google default which are not recommended anyways autoCreateSubnetworks: false x_gcloud_subnet_mode: CUSTOM
we enable flow logs enableFlowLogs: true logConfig: aggregationInterval: INTERVAL_15_MIN enable: true flowSampling: 1.0 metadata: INCLUDE_ALL_METADATA
@pankajmt I don't see the YAML attached to this ticket. Did you send them as a file through slack or email?
Anything network related is a possible cause for this. When you say that you add your own subnets, are you following this doc?
What is the output from kubectl api-resources
?
YAML sent on email after your message!
Output if kubectl api-resources
reads
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
componentstatuses cs false ComponentStatus
configmaps cm true ConfigMap
endpoints ep true Endpoints
events ev true Event
limitranges limits true LimitRange
namespaces ns false Namespace
nodes no false Node
persistentvolumeclaims pvc true PersistentVolumeClaim
persistentvolumes pv false PersistentVolume
pods po true Pod
podtemplates true PodTemplate
replicationcontrollers rc true ReplicationController
resourcequotas quota true ResourceQuota
secrets true Secret
serviceaccounts sa true ServiceAccount
services svc true Service
mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition
apiservices apiregistration.k8s.io false APIService
controllerrevisions apps true ControllerRevision
daemonsets ds apps true DaemonSet
deployments deploy apps true Deployment
replicasets rs apps true ReplicaSet
statefulsets sts apps true StatefulSet
tokenreviews authentication.k8s.io false TokenReview
localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io false SubjectAccessReview
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
verticalpodautoscalercheckpoints vpacheckpoint autoscaling.k8s.io true VerticalPodAutoscalerCheckpoint
verticalpodautoscalers vpa autoscaling.k8s.io true VerticalPodAutoscaler
cronjobs cj batch true CronJob
jobs batch true Job
certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest
leases coordination.k8s.io true Lease
bgpconfigurations crd.projectcalico.org false BGPConfiguration
bgppeers crd.projectcalico.org false BGPPeer
blockaffinities crd.projectcalico.org false BlockAffinity
clusterinformations crd.projectcalico.org false ClusterInformation
felixconfigurations crd.projectcalico.org false FelixConfiguration
globalbgpconfigs crd.projectcalico.org false GlobalBGPConfig
globalfelixconfigs crd.projectcalico.org false GlobalFelixConfig
globalnetworkpolicies crd.projectcalico.org false GlobalNetworkPolicy
globalnetworksets crd.projectcalico.org false GlobalNetworkSet
hostendpoints crd.projectcalico.org false HostEndpoint
ipamblocks crd.projectcalico.org false IPAMBlock
ipamconfigs crd.projectcalico.org false IPAMConfig
ipamhandles crd.projectcalico.org false IPAMHandle
ippools crd.projectcalico.org false IPPool
networkpolicies crd.projectcalico.org true NetworkPolicy
networksets crd.projectcalico.org true NetworkSet
ingresses ing extensions true Ingress
capacityrequests capreq internal.autoscaling.k8s.io true CapacityRequest
serviceprofiles sp linkerd.io true ServiceProfile
nodes metrics.k8s.io false NodeMetrics
pods metrics.k8s.io true PodMetrics
storagestates migration.k8s.io false StorageState
storageversionmigrations migration.k8s.io false StorageVersionMigration
managedcertificates mcrt networking.gke.io true ManagedCertificate
ingresses ing networking.k8s.io true Ingress
networkpolicies netpol networking.k8s.io true NetworkPolicy
runtimeclasses node.k8s.io false RuntimeClass
updateinfos updinf nodemanagement.gke.io true UpdateInfo
poddisruptionbudgets pdb policy true PodDisruptionBudget
podsecuritypolicies psp policy false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io false ClusterRole
rolebindings rbac.authorization.k8s.io true RoleBinding
roles rbac.authorization.k8s.io true Role
scalingpolicies scalingpolicy.kope.io true ScalingPolicy
priorityclasses pc scheduling.k8s.io false PriorityClass
trafficsplits ts split.smi-spec.io true TrafficSplit
csidrivers storage.k8s.io false CSIDriver
csinodes storage.k8s.io false CSINode
storageclasses sc storage.k8s.io false StorageClass
volumeattachments storage.k8s.io false VolumeAttachment
cronjobs cj tap.linkerd.io true Tap
daemonsets ds tap.linkerd.io true Tap
deployments deploy tap.linkerd.io true Tap
jobs tap.linkerd.io true Tap
namespaces ns tap.linkerd.io false Tap
pods po tap.linkerd.io true Tap
replicasets rs tap.linkerd.io true Tap
replicationcontrollers rc tap.linkerd.io true Tap
services svc tap.linkerd.io true Tap
statefulsets sts tap.linkerd.io true Tap
Finally, One of the other differences I spotted is we use a custom service account with restricted permissions. I gave our custom service account completed roles/editor permissions too, which your service account would have but that did not help.
We managed to solve this with help from Buoyant and Charles. We were using UID 65535 for the proxy, which was not to be used in old days when UIDs were 16 bits. It is best avoided, as in this case, it caused some issue using it on Container OS.
We are now using UID 70000, which is allowed by our application PSPs.
@pankajmt thanks for the update on GitHub. This was a tough, but fun issue to track down. Really appreciate your patience and great communication during the process. 😄
Bug Report
What is the issue?
linkerd is injected, but when we check the comms with edges and tap, the comms are not encrypted
How can it be reproduced?
CNI install - linkerd install-cni --dest-cni-bin-dir /home/kubernetes/bin --dest-cni-net-dir /etc/cni/net.d --cni-log-level debug --proxy-uid 65535 | kubectl apply -f - Linkerd install - linkerd install --linkerd-cni-enabled --proxy-log-level=warn,linkerd=info,linkerd2_proxy=debug --proxy-uid=65535 | kubectl apply -f - Open firewall from master to nodes as per https://linkerd.io/2/reference/cluster-configuration/#private-clusters Install emojivoto app as per https://linkerd.io/2/getting-started/#step-5-install-the-demo-app
Logs, error output, etc
(If the output is long, please create a gist and paste the link here.)
Will attach: Screenshots of our cluster and node pool configuration Linkerd and Emojivoto startup logs Linkerd identity and Linkerd proxy logs from web deployment Tap for web and emoji with -o json Edges for emojivoto namespace
linkerd check
outputEnvironment
Possible solution
Additional context