Closed aatarasoff closed 2 years ago
You'll need to create authorizations that permit unauthenticated connections on the ports that require connections from kubelet. Unfortunately, we don't have any mechanism to discover the source IPs for kubelet probes.
@olix0r it was the first that I did.
I've added this ServerAuthorization
to allow kubelet requests pods:
---
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
namespace: coolns
name: allow-healthchecks
spec:
server:
selector:
matchLabels:
anna.money/server-type: common
client:
unauthenticated: true
networks:
# kubelet CIDRs here, for example with templating
{% for n in range(x, y) %}
{% for m in range(0, 256) %}
- cidr: 10.{{ n }}.{{ m }}.1/32
{% endfor %}
{% endfor %}
It solved the problem with healthchecks but add the new one. Pods started to reject any request from other pods:
{"timestamp":"[ 378.100787s]","level":"INFO","fields":{"message":"Request denied","server":"serving-port","tls":"Some(Established { client_id: Some(ClientId(\"nginx-ingress.serviceaccount.identity.linkerd.cluster.local\")), negotiated_protocol: None })","client":"<other_pod_address>"},"target":"linkerd_app_inbound::policy::authorize::http","spans":[{"name":"inbound"},{"port":80,"name":"server"}],"threadId":"ThreadId(1)"}
{"timestamp":"[ 378.100852s]","level":"INFO","fields":{"message":"Request failed","error":"unauthorized connection on server serving-port"},"target":"linkerd_app_core::errors::respond","spans":[{"name":"inbound"},{"port":80,"name":"server"},{"client.addr":"<other_pod_address>","name":"rescue"}],"threadId":"ThreadId(1)"}
Then I added another one authorization to explicitly allow meshed connections:
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
namespace: coolns
name: allow-authenticated
spec:
server:
selector:
matchLabels:
anna.money/server-type: common
client:
meshTLS:
identities:
- "*"
It works. But it means that (at least in my case) all-authenticated
mode is useless because it is clearer to use deny
mode and setup policies manually.
Moreover, this workaround relies on internal implementation of GKE and some magic knowledge. It leads to potential risks in case of changes in GKE that we cannot control.
I see that Istio try to fix the healthchecks problem another way: https://istio.io/latest/docs/ops/configuration/mesh/app-health-check/. What do you think about this approach? For me it is more transparent for an end user.
@aatarasoff Looking at the Servers and ServerAuthorizations that you created I think there is an easier solution here.
srv/serving-port
is selected by saz/allow-healthchecks
because it's selector anna.money/server-type: common
selects both of the Servers that you created. This means that saz/allow-healthchecks
is only allowing traffic from the kubelet CIDRs to both of these Servers.
To fix this, you should be okay omitting srv/serving-port
and only create srv/linkerd-admin-port
and saz/allow-healthchecks
. This means that ports besides linkerd-admin
will not have a selecting Server and they should fall back to the cluster default of allow-authenticated
.
Let me know if this solves your issue.
@kleimkuhler Hi!
No, it is not working. If we omit serving-port
then healthchecks are not passed for the main container. They are passed for Linkerd only.
As you could on screenshots api container restarts because it cannot pass readiness and liveness probes:
This issue hit me too. AKS (with Azure CNI) unfortunately forces me to host nodes and pods within the same IP range - so I am not able to make an exception for some IP range which has been reserved for nodes.
The possible/proposed solution of Istio seems nice. Have you considered implementing something similar?
@Ziphone it seems like a reasonable approach, but would take some design and work to implement something like that. Such a feature isn't currently scoped or planned.
This feature would be great - also on AKS, so I'm going to try doing the workaround of listing all the possible .1 addresses in the CIDR block, but this means we need to make assumptions about / be aware of the CIDR ranges and IP address of the kubelet. If they ever change (however unlikely that might be), that could cause health checks to fail again or possibly other workloads getting access that shouldn't have it.
Edit* well, haven't gotten the workaround to work yet. It's still rejecting the health checks even if I list every .1 address in the pod CIDR range (also tried the node CIDR range).
@adleong @kleimkuhler Hi again. I've checked how Istio does it and think that we can do something similar. I do not know well Linkerd implementation specifics, but I create some principal schemes with comments.
First, let's look at the as-is diagram:
We have two containers in POD: the main container with healthchecks and the linkerd-proxy with it. And at the moment, we should add an implementation-specific authorization policy for both of them.
To fix this, we can do the trick and rewrite the main container spec definition during injection to reroute healthchecks from it to linkerd-proxy that should process them accordingly. For example, the proxy should call a real healthcheck on the main container. To keep information about real healthchecks, we can use an environment variable.
After that, the problem with the requirement of wide-scope unauthorized policy will still exist (but only for linkerd-proxy admin port). To eliminate it, we should grant non-mTLS connections to the proxy admin port (maybe by specific routes only) or introduce a new one for this particular reason.
So, what do you think about this view from 30 000 feet?
I like the idea of a separate port for probes. I don't know what the admin port in linkerd gives access to, but giving unauthenticated access to it sounds scary (and will probably not sit well with our security team).
Am I understanding correctly that as things stand today, any ports that are used for healthchecking must be completely opened for unauthenticated requests from outside of the mesh? For most off-the-shelf applications, the K8s probes are hitting the same endpoints as served by the app for the actual business logic, which means that ServerAuthorizations are basically useless. Unless dedicated ports are used for probes, you are not able to guarantee that traffic to that service is coming from inside of the mesh. This seems like a nonstarter for any real world applications where you don't control the source code of every single component deployed on your clusters (does this situation even exist in K8s...?).
I noticed that this was added to stable-2.12.0, but I don't see any other linked issues that outline how this problem will be solved in stable-2.12.0
To my mind, probes should be a first class citizen and auto proxied, like in istio and consul. I think the attempt to fit them into the generic framework "Oh this is just someone hitting this port, this path", is understandable but leads to a huge mess of replicating the same exceptions. I'd much rather @mateiidavid have an annotation that "magically" reads the existing port/path definition and rewrites them to the proxy.
@mikebell90 I'm sympathetic to the pains around authorizing probes.
I think it's dangerous, however, to have implicit/"magical" authorization policies. For authorizations to be auditable & debuggable, we really want to tie all authorized requests to a resource in the cluster. That said, I think there's a path to making this automated.
In 2.12, we are introducing route policies: that is, we'll have the ability to say that the /ready
endpoint does not require authentication. Once this is in place, it should be possible to write a controller that automates the creation of authorizations for probe routes.
i get what you are saying and as an architect myself I sympathize. However purity here leads to either repeated boilerplate or people writing a controller (or using something like kyverno I guess). This is a lot of extra machinery for something that nearly every deployment needs. That would be my counter argument. @olix0r . Probes ain't exactly uncommon, and this is at the heart of deployment, and will likely lead to product engineer frustration whenever this is not setup properly.
@mikebell90 Yeah, those are good points. In any case, the fully automated solutions is probably out of scope for 2.12.0; but we'll look more closely at a better solution as this stuff lands on main.
Yeah all I’m suggesting is an open mind and keep an eye on user feedback.
in particular composable policies and ways to trace them in the cli and viz - I know buoyant has some of these
Is there anything upstream in K8s planned that would enable the kubelet to have a service identity, so we can authorize the health probes only for the kubelet?
I've been fiddling with this back and forth and two options COULD help in specific scenarios
With the first setting you could exclude node subnet range, and allow kubelet health checks to work, if you have different subnets for pod/service cidrs. Only thing is that the pods that are using host network, would be bypassed as well
With the second setting you could achieve the same, and not to have specify all ports that are serving something in the same namespace. it's annoying if you have a) same ports for actual service + health check b) multiple services that have different ports for service / health check and don't mind about node network being able to connect to those unauthenticated.
Of course best would be scraping & mutating the Liveness & Readiness probes to go through the proxy linkerd-admin port and allow that for node network or so.
Last week's edge release (or the prior week's?) included the ability to explicitly set per-route authorization policies on inbound traffic. The upcoming edge release will include automatic authorizations for probe endpoints (when no routes are explicitly configured on a server). We're still working through documentation and deeper testing in preparation for the stable-2.12 release--we'll share this documentation on this issue as it's available.
The upcoming edge release will include automatic authorizations for probe endpoints (when no routes are explicitly configured on a server)
did this make it into the stable 2.12 release? im currently using a serverauthorization with our node cidrs to allow kubelet probes in an all-authenticated
cluster. wondering if im able to delete that
@chriskim06 yep. We have already deleted them and everything works good 👍
@aatarasoff that's awesome 👍, glad i can get rid of that then
when no routes are explicitly configured on a server
if i were to configure an httproute
for a server
would that mean there wouldn't be automatic authorization for these probes? if so is there an option besides configuring a networkauthentication
with the node cidrs like we did pre 2.12?
if i were to configure an
httproute
for aserver
would that mean there wouldn't be automatic authorization for these probes?
That's correct. You would need to create a NetworkAuthentication for Pods selected by that Server now, but unlike before you can at least authenticate only the paths /live
and /ready
for example. If you take a look at creating per-route policy resources you'll see how to create those resources for authenticating specific paths.
The Default Authorizations section of the docs covers how healthchecks are now authorized by default.
Bug Report
What is the issue?
After turning on the
all-authenticated
policy mode, affected pods cannot start because of 403 error.How can it be reproduced?
Just add
config.linkerd.io/default-inbound-policy: all-authenticated
annotation on a namespace.Logs, error output, etc
linkerd check
outputEnvironment
Possible solution
Something like this (https://istio.io/latest/docs/ops/configuration/mesh/app-health-check/) would be nice.
Workaround:
Additional context
Original thread from official slack: https://linkerd.slack.com/archives/C89RTCWJF/p1633592434126000