Open tjorri opened 2 weeks ago
Actually ended up looking into this a bit more and I'm guessing that the endpoint watchers (e.g. https://github.com/linkerd/linkerd2/blob/main/controller/api/destination/watcher/endpoints_watcher.go#L917, https://github.com/linkerd/linkerd2/blob/main/controller/api/destination/watcher/endpoints_watcher.go#L1025) make decisions to omit unready endpoints.
https://linkerd.io/2-edge/features/protocol-detection/#marking-ports-as-opaque already supports flagging opaque ports via an annotation in the Service itself, so one approach could be to support Service-level annotation for communicating whether to allow unready Endpoints to still be included as destinations.
I'm not immediately sure whether this would bring any adverse effects, or whether it would be considered acceptable to expand the scope in this way.
@tjorri, as I recall from Slack, your use case is pretty unique -- can I trouble you to summarize that here, as well? I feel like just "have a way to ignore unready endpoints" is likely to be a bit overly blunt, so the details of your use case might be helpful too.
Sure, not sure about the uniqueness per se, but basically generally the issue I think is present in any type of clustered system which requires nodes of the cluster to build up state together with any other nodes directly before being ready to accept client connections (the readiness, as we understand it, is more related to this, rather than being ready to accept any connections).
Specifically in our case we currently have a statically sharded actor framework where effectively we have an arbitrary number of different shard types, which get spun out to StatefulSets and then Pods. Each Pod knows the desired topology, including the names of shard types, desired replica counts, and then thanks to the stable network IDs, they're able to figure out the DNS names for all other Pods across all the shard types. This in our case relies on the headless Service and publishing the addresses of unready nodes.
While we're building up the entire state of the system, we can't let clients connect yet so we have to report unready so that the primary Services continue to not give access to them. If we allowed readiness probes to succeed earlier, we would then have to build an additional level of handling readiness along the lines of "LB, we report ready but we really aren't ready yet, so just give us traffic and we'll then tell clients that nope, not actually ready.", which in our mind defeats the purpose of the readiness check for allowing client connections in the first place.
Conversely, we might have a Pod in an unready state for whatever reason, potentially e.g. due to having too much load, but we would still need the Pod to have communication channels to other Pods to allow moving state over, for example. So losing connectivity at that point for the intra-cluster level would be problematic.
I'm sorry if this is still a bit abstract, but I'd be happy to answer any more specific questions if I can. :)
Also, spent a short time digging into Linkerd more today and realized that what I think I'm after is the Pod readiness checks in the workload watcher and was thinking of just testing this out with some lightweight, Pod-level annotations for skipping the readiness requirement just for fun. It might indeed be a bit blunt and I'm not sure whether potentially relaxing the readiness requirement might impact other parts of the system (service discovery/load balancing?), so I'll take your word if you feel this is a dead end.
Proxies do not block inbound traffic on container readiness. This would be a chicken-and-egg problem: the proxy needs to proxy readiness probes!
The details you investigated about the destination controller's endpoint watcher aren't related to this--that controls how outbound proxies resolve Services; but I don't think that's the problem you're describing.
When we enable linkerd-proxy sidecars, we observe that direct Pod-to-Pod communication is rejected by linkerd-proxy.
Can you elaborate on this? Do the proxies log anything in this case?
If this is a non-HTTP protocol, it may be that you must mark this port as opaque.
Ok, so if the outbound proxies rely on the endpoint watcher details, would it then be the outbound proxies not sending traffic forward if the Service is not ready? And because of the headless Services being used for discovering the other Pods, would that be the issue here?
When I was looking at this a few weeks ago, I built a custom image which relaxed the readiness requirements for the watchers and got things working with it, which I guess would be in line with what you described.
Had to put this item on the back-burner for a while, but I'll try to get back to this and build some form of repro next week to explore this more and get back with more details.
would it then be the outbound proxies not sending traffic forward if the Service is not ready? And because of the headless Services being used for discovering the other Pods, would that be the issue here?
When an application resolves a headless service, it gets back pod IPs, and so Linkerd's outbound resolution should be in terms of the target pod and not the Service IP.
This goes through the workload watcher which does appear to ignore pods before they are marked ready. If I'm reading it correctly, my sense is that this filtering logic may be inappropriate. We should allow direct connections as soon requested.
Next time you get to look at this, can you inspect the output of
:; linkerd diagnostics profile $POD_IP:80
This will show the control plane's endpoint response for these connections.
If that contains a proper endpoint
section, then the watcher logic probably isn't at play, and the next step will be to set the config.linkerd.io/proxy-log-level=linkerd=debug,info
annotation to get a clearer picture of what behavior the client observing.
What problem are you trying to solve?
We have a clustered service, which at boot time requires all the cluster nodes to establish connections with each other prior to the service (or any of the Pods) being able to pass readinessProbes (specifically we run StatefulSets with a headless Service set to
publishNotReadyAddresses: true
, which allows nodes to discover other nodes and then communicate on the clustering port on each Pod directly). We run a separate, regular Service for then providing the interface for other services to communicate to this service, which refuses traffic until the clustered service is ready through the readinessProbes.When we enable linkerd-proxy sidecars, we observe that direct Pod-to-Pod communication is rejected by linkerd-proxy. If we deactivate the readinessProbes for the Pods (or set
config.linkerd.io/skip-inbound-ports: [clustering port]
), the cluster setup works, so our belief is that the unreadiness of the Pod causes the proxy to reject traffic.In our situation we are attempting to use Linkerd mainly for convenient encryption of traffic as well as the mTLS-based authentication and authorization.
How should the problem be solved?
It would be ideal if we could potentially flag certain ports to be such that linkerd-proxy should accept traffic on those ports even if a readiness check fails.
I'm not fully familiar with the internals of Linkerd, so an initial naive, end-user view would be to think of it similar to the
config.linkerd.io/skip-inbound-ports
annotation/options, but I understand those are mainly used for the linkerd-proxy-init and iptables setup, so perhaps this would require some further coordination with other resources (potentially e.g. allowing configuration of aServer
resource to state that the target should receive traffic irrespective of readiness checks).Any alternatives you've considered?
Right now our less-than-elegant workaround is to omit the clustering ports from linkerd-proxy through the use of
config.linkerd.io/skip-inbound-port
/--inbound-ports-to-ignore
configs. As the clustering ports are important from a security standpoint, we then overlay a NetworkPolicy that forbids access to those ports. This is a bit ugly but works in a single-cluster context, but would fail with multi-clustered setups, at which multi-cluster capable NetworkPolicies or similar structures would be needed, whereas having this type of functionality in Linkerd would allow keeping the toolset smaller and more elegant.How would users interact with this feature?
No response
Would you like to work on this feature?
None