Closed tomastigera closed 1 month ago
However, I encountered a connection timeout when the client pod was using host networking (e.g., a calico-controller node) and the destination was a service with an external IP (e.g., an API server installed outside the pod network).
You mean you were not able to connect to the external ep via a service with Istio? Or you were just not able to connect to an external ep via the service w/o Istio?
In another case ( host-network ->svc , pod->svc of pod ) is ok.
You mean that with Istio (or without?) you can make connection from host-net->svc to a pod endpoint? And you can as well connect with Istio from a pod to another pod via a service?
Afaict Istio is not really meant for host networked clients, is it? And I cannot quite see a difference (fromthe point of the dataplane) between connecting via a service to a pod ep or an external ep. In either case, we translate the dest IP and let Linux route it.
Would you be able to share iptables dump (to see what rules Istio injected - if any) and routing from the host?
I might have a similar issue, trying the same configuration from #4509 with bpfConnectTimeLoadBalancing=Disabled and bpfHostNetworkedNATWithoutCTLB=Enabled.
Since then, I have some hosts and pods in the hostNetwork which cannot reach the clusterIP of the apiserver anymore (100.72.0.1). But this is unrelated to Istio (we do have Istio, but neither enabled for the hostNetworked pod nor the apiserver. Any data I can share? Detail: This does not affect all hosts - in fact it affects exactly those which do have a pod being endpoint for the clusterIP, i.e. my master nodes with apiservers on them.
edit: using 3.27.2
edit2: setting bpfConnectTimeLoadBalancing to TCP solves this problem functionally.
before trying the new config, we were using the feature gate approach BPFConnectTimeLoadBalancingWorkaround=udp
edit2: setting bpfConnectTimeLoadBalancing to TCP solves this problem functionally. before trying the new config, we were using the feature gate approach BPFConnectTimeLoadBalancingWorkaround=udp
These two things are equivalent. bpfConnectTimeLoadBalancing=Disabled
turns off the connect time LB for TCP as well. So there seems to be an issue which would probably still manifest itself with UDP.
Is the kube-apiserver host-networked or not?
Is the kube-apiserver host-networked or not?
Our apiserver pods are host-networked static pods.
@sfudeus that is a real issue and I will track it separately as it looks different to the original issue of this ticket.
@tomastigera I seem to be dealing with this error as well (neither hostnetwork pods nor nodes can reach the apiserver on the Service (10.96.0.1:443), but have no trouble reaching it on the actual master node IPs:6443).
Setup:
The hostnetwork program with problems is the istio CNI plugin, which simply goes to 10.96.0.1:443 in its kubeconfig and has no way to permanently set it to anything else like the HA apiserver address. This results in new pods hanging forever as the istio CNI cannot obtain the information it needs to configure the pod network.
If I set bpfConnectTimeLoadBalancing back to TCP and bpfHostNetworkedNATWithoutCTLB to Disabled it works again, but I need it to be on as I'm running into istio sidecar issues (it appears as if the clientside sidecar does not recognize that traffic is going to another pod with an istio sidecar, so it doesnt send a client certificate, which the docs suggest these 2 parameters can fix by allowing istio to do the Service balancing and not ebpf).
FelixConfig:
spec:
bpfConnectTimeLoadBalancing: Disabled
bpfExternalServiceMode: DSR
bpfHostNetworkedNATWithoutCTLB: Enabled
bpfLogLevel: ""
failsafeInboundHostPorts:
- net: ""
port: 22
protocol: tcp
- net: ""
port: 6443
protocol: tcp
floatingIPs: Disabled
healthPort: 9099
logSeverityFile: Warning
logSeverityScreen: Warning
logSeveritySys: Warning
prometheusMetricsEnabled: true
reportingInterval: 0s
vxlanVNI: 4096
wireguardEnabled: true
@tomastigera I tested version 3.27 to see if the commit could resolve the Istio compatibility issue.
with this configuration.
However, I encountered a connection timeout when the client pod was using host networking (e.g., a calico-controller node) and the destination was a service with an external IP (e.g., an API server installed outside the pod network).
This is endpoint which have problem.
In another case ( host-network ->svc , pod->svc of pod ) is ok.
Originally posted by @zoftdev in https://github.com/projectcalico/calico/issues/4509#issuecomment-1953439604