linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.62k stars 1.27k forks source link

MeshTLSAuthentication requires at least one identity, but CRD doesn't correctly document it #10782

Closed whiskeysierra closed 1 year ago

whiskeysierra commented 1 year ago

What is the issue?

The MeshTLSAuthentication CRD requires one of either identities or identityRefs. But it does allow empty arrays, while the implementation doesn't.

How can it be reproduced?

Create a MeshTLSAuthentication with an empty list of identities:

apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
  name: "test"
spec:
  identities: []

Based on the docs:

[..] this means that clients must be in the mesh and must have one of the specified identities in order to be authorized to send to the target.

https://linkerd.io/2.12/reference/authorization-policy/#meshtlsauthentication

I'd assume that an empty list is the equivalent of a deny policy, because no client can have an identity that is part of that empty list.

Logs, error output, etc

linkerd-destination logs:

linkerd-destination-57fd79bbbb-672b2 policy 2023-04-19T14:47:45.428158Z  WARN meshtlsauthentications: linkerd_policy_controller_k8s_index::inbound::index: Invalid MeshTLSAuthentication ns=test name=test error=No identities configured

Relevant source code can be found here:

https://github.com/linkerd/linkerd2/blob/9dce7e65190a97ec542c6fe03797ae7f0e583734/policy-controller/k8s/index/src/inbound/meshtls_authentication.rs#L44

output of linkerd check -o short

linkerd check -o short
control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 23.3.4 but the latest edge version is 23.4.1
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-23.3.4 but cli running stable-2.13.1
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
    * linkerd-destination-57fd79bbbb-672b2 (edge-23.3.4)
    * linkerd-destination-57fd79bbbb-srwjt (edge-23.3.4)
    * linkerd-identity-56fb6c656f-949vv (edge-23.3.4)
    * linkerd-identity-56fb6c656f-s5z9v (edge-23.3.4)
    * linkerd-proxy-injector-54484d7dfd-8szdd (edge-23.3.4)
    * linkerd-proxy-injector-54484d7dfd-c4zzw (edge-23.3.4)
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-57fd79bbbb-672b2 running edge-23.3.4 but cli running stable-2.13.1
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.13/checks/#l5d-injection-disabled for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
    * metrics-api-79f5fb9478-vkxbw (edge-23.3.4)
    * tap-74b6c8b95-6qzwk (edge-23.3.4)
    * tap-74b6c8b95-wnrxg (edge-23.3.4)
    * tap-injector-776cb6c64-85bkh (edge-23.3.4)
    * web-74f8bb4c67-vwzg6 (edge-23.3.4)
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    metrics-api-79f5fb9478-vkxbw running edge-23.3.4 but cli running stable-2.13.1
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cli-version for hints
‼ prometheus is installed and configured correctly
    missing ClusterRoles: linkerd-linkerd-viz-prometheus
    see https://linkerd.io/2.13/checks/#l5d-viz-prometheus for hints

Status check results are √

Environment

Possible solution

I see two possible options:

  1. Either allow empty identities and treat them the same way as an explicit deny policy.
  2. Add minItems: 1 two both properties: identities and identityRefs here https://github.com/linkerd/linkerd2/blob/9dce7e65190a97ec542c6fe03797ae7f0e583734/charts/linkerd-crds/templates/policy/meshtls-authentication.yaml#L39-L86

Additional context

https://swagger.io/docs/specification/data-models/data-types/#array-length

Would you like to work on fixing this bug?

yes

whiskeysierra commented 1 year ago

A little follow-up, the issue seems to be slightly bigger (or two separate issues, not sure?!). With a broken MeshTLSAuthentication the linkerd-destination seems to completely choke and actually stop correctly processing others.

We saw tons of logs like this:

linkerd-destination-c769d8694-g4lkp policy 2023-04-21T12:12:09.613527Z INFO meshtlsauthentications:apply{ns=linkerd-viz name=metrics-api-web}:reindex{ns=services-booking-platform}:pod{pod=account-oauth2-session-store-node-2}: linkerd_policy_controller_k8s_index::inbound::index: Illegal AuthorizationPolicy; ignoring server=account-oauth2-session-store-sentinel authorizationpolicy=account-oauth2-session-store-sentinel error=could not find MeshTLSAuthentication account-oauth2-session-store in namespace services-booking-platform

But those actually exists. As soon as we deleted the faulty MeshTLSAuthentication the others were processed correctly.

adleong commented 1 year ago

Nice find, @whiskeysierra! I suspect that https://github.com/linkerd/linkerd2/blob/main/policy-controller/k8s/index/src/inbound/index.rs#L628 may be the problem here.

As for the issue with empty identity lists, an empty identity list should be equivalent to not having the Authentication in the first place so I don't think it's ever useful to be able to specify an empty list here. Updating the CRD to add minItems: 1 sounds more correct to me.

whiskeysierra commented 1 year ago

Nice find, @whiskeysierra! I suspect that https://github.com/linkerd/linkerd2/blob/main/policy-controller/k8s/index/src/inbound/index.rs#L628 may be the problem here.

Yeah, I don't know Rust so I wasn't quite sure about the scope of the that return statement. If it breaks the loop/sequence early, then yes, that's most likely the problem for the cascading issue with the policies.

As for the issue with empty identity lists, an empty identity list should be equivalent to not having the Authentication in the first place so I don't think it's ever useful to be able to specify an empty list here. Updating the CRD to add minItems: 1 sounds more correct to me.

Yeah, that would work for me.

Do you consider those to be two separate issues? (Shall I create a second one?)