pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.58k stars 428 forks source link

Support custom SSL certificates for cloud communication #710

Open clemenskol opened 1 year ago

clemenskol commented 1 year ago

Is your feature request related to a problem? Please describe.

We are using the Pixie cloud in a self-hosted, multi-tenant environments. In some scenarios, including (but not limited to) dev/testing, we would like to deploy Vizier connected to the cloud using a self-signed certificate, or at least a certificate that is not signed by a global, trusted CA.

For this purpose, we want to be able to configure Vizier to use a CA certificate of the cloud backend stored in a k8s secret.

This does not seem possible currently, as the certificate loading logic/config is shared between the cloud-gRPC and NATs pieces, meaning that I cannot change the certificate for gRPC only without affecting NATs.

Furthermore, while technically unrelated, it would be practical to also allow disabling of SSL independently for gRPC and NATs. If we split SSL config, we might as well split that related option too. And, while at it - but now we could really consider a separate story: support insecure SSL for cloud connections for testing (disable validation of SSL but keep SSL enabled).

Describe the solution you'd like

SSL certificates and SSL config are split between NATs and cloud gRPC so they can be configured independently from each other. Support loading CA certificates for gRPC from a custom secret (created by the user; doesn't need to be deployed by Pixie)

Describe alternatives you've considered

Disable SSL validation for testing purposes, but this is not supported and is a worse situation. Relying on public CAs for our test/dev environments is not an opotion

Additional context

clemenskol commented 1 year ago

FYI, there is a way to work around this without native support in Pixie: since the cloud-connector proxy uses the standard golang gRPC libraries, we can override the root CA used by the process. For example, we can extend the deployment/vizier-cloud-connector to have the following volume and volumeMount:

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: vizier-cloud-connector
  ...
spec:
  ...
  template:
    ...
    spec:
      ...
      containers:
      - env:
        ...
        name: app
        volumeMounts:
        ...
        # NEW ENTRY: vvvvvvvvv
        - mountPath: /etc/ssl/certs/ca-certificates.crt
          name: control-plane-certs
          subPath: "ca.crt"
        # NEW ENTRY: ^^^^^^^^^^
      ...
      serviceAccountName: cloud-conn-service-account
      volumes:
      ...
      # NEW ENTRY: vvvvvvvvv
      - name: control-plane-certs
        secret:
          secretName: control-plane-tls-certs
      # NEW ENTRY: ^^^^^^^^^^

and put the CA cert of the cloud control-plane into a secret in the pl namespace:

apiVersion: v1
kind: Secret
metadata:
  name: control-plane-tls-certs
  namespace: pl
type: Opaque
data:
  ca.crt: <base64-of-certificate>

which can be found in the cloud using

kubectl --namespace plc get secret cloud-proxy-tls-certs -o jsonpath='{.data.tls\.crt}'

Conceptually it's of course still cleaner to split NATs and gRPC configs, but at least the above allows connectivity without changing the containers/code.

NOTE: The above replaces the root CAs (instead of adding it). This means that only the local cluster cert generated by Vizier and the installed cloud cert are trusted - nothing else. Be warned of this side-effect, but in most scenarios that should be fine, as the cloud-connector proxy shouldn't talk to anything else.

MrAta commented 1 year ago

@clemenskol I wonder how do you work around the self signed service certs in a self hosted environment? I am asking because the the installation guide uses a local CA to generate does certs which are not trusted on the cluster.

clemenskol commented 1 year ago

@clemenskol I wonder how do you work around the self signed service certs in a self hosted environment? I am asking because the the installation guide uses a local CA to generate does certs which are not trusted on the cluster.

Not sure I understand the question. You mean trust it on the server-side? I extract the server certificate and mark it as trusted on the client, and I use the certificate (and the self-signed CA) on the server for making the connection trusted.

MrAta commented 1 year ago

@clemenskol I am creating the service-tls-certs through this script from pixie, but the PEM pods fail to connect to nats with tls handshake error and nats logs show that:

cid:447 - TLS handshake error: remote error: tls: unknown certificate authority

I thought it's because the certificate is singed by local CA and that's why the nats Pods don't trust it. I was wondering if you have faced this error or not? Also what do you mean by extract the server certificate and mark it as trusted on the client? don't use the Pixie secret generation script? Thanks!

clemenskol commented 1 year ago

if you use the file ca.crt (generated in the script you link), and mount it into the client pod, the certificate check should work. Are you sure you are using the correct hostname to connect? Make sure you are using the right hostname associated with the ca (see https://github.com/pixie-io/pixie/blob/main/scripts/create_cloud_secrets.sh#L60)

MrAta commented 1 year ago

Yea the ca.crt file is already being mounted by that secret. As for alt names, I believe that should work out of the box though I've added any other possible alt names in addition to those as well, but still no luck.