ondat / charts

Ondat Helm charts.
https://ondat.github.io/charts
MIT License
4 stars 5 forks source link

TLS client cert with wrong SAN #18

Open Arau opened 2 years ago

Arau commented 2 years ago

The Ondat cluster can't connect to etcd due to a

{... msg":"rejected connection","remote-addr":"192.168.7.69:50060","server-name":"storageos-etcd.storageos-etcd","error":"remote error: tls: bad certificate"}

This is happening because the certificate in the storageos-etcd-secret has the following SAN definition

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment, Data Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name:
                DNS:storageos-etcd-secret

The DNS fieldstorageos-etcd-secret should match the DNS name:*.storageos-etcd.storageos-etcd

angelos-p commented 2 years ago

Yeah I noticed that too while going through the ETCD Controller code, the SAN is set to be the secret name here: https://github.com/storageos/etcd-cluster-operator/blob/main/controllers/etcdcluster_controller.go#L240

Which I found odd, but my cluster worked fine even with the SAN looking wrong.

It should be something along the lines of:

fmt.Sprintf("*.%s.%s", cluster.Name, cluster.Namespace)
cvlc commented 2 years ago

Hey @Arau -

I can see the symptoms you describe (log lines seem to lead to this code) and I can confirm that the certificate SAN is storageos-etcd-secret for the client certificate, but this doesn't seem to impact functionality of the Ondat cluster. It works regardless, as @aeroniero33 notes.

I'm curious about the Ondat cluster not being able to connect - do you see any other log lines, perhaps in the API manager or scheduler? Are any pods NotReady?

Arau commented 2 years ago

Hi,

I executed the installation of charts with the umbrella and I see the issue as the Ondat pods can't connect to Etcd. Etcd logs indicating "error":"remote error: tls: bad certificate".. The node pods cannot start at all.

Then I executed the installation with the etcd chart first and then the ondat-operator. The result is the same. I fixed in the cluster by copying the contents of the secret storageos-etcd-client into the storageos-etcd-secret while keeping the file names in the storageos-etcd-secret as expected by the CP. The secret storageos-etcd-client has got the right alternative names. After that and a restart of the node pods, then the cluster started successfully.

In my values I put

  kvBackend:
    address: 'https://storageos-etcd.storageos-etcd:2379'

I'm thinking if it is possible that the tests didn't have the https:// prefix.

cvlc commented 2 years ago

I've been using etcd without the https:// prefix and it's been working fine for me!

Another thought - did you uninstall/reinstall on the same cluster? I've noticed that the storageos namespace with associated storageos-etcd-secret is not necessarily deleted when helm uninstall is run. If that secret persists through an uninstall-reinstall, or the associated pods on the etcd or storageos side do, there'll be a mismatch as:

  1. Mounted secrets do not automatically trigger a k8s pod/ds refresh
  2. The operator doesn't seem to want to overwrite it's own etcd secret
cvlc commented 2 years ago

We've pushed a new version, please re-test and let me know whether that resolves the issue!

cannischan commented 2 years ago

@Arau could you please review this issue? Thank you :)