neo4j / helm-charts

Apache License 2.0
61 stars 53 forks source link

[Bug]: Neo4j deployed via Helm chart doesn't pick up renewed SSL Certificate automatically, so I have to manually scale down to 0 and up to 1 for it to pick up the renewed one. #247

Open vinnytwice opened 1 year ago

vinnytwice commented 1 year ago

Contact Details

No response

What happened?

I have a Node.js server which uses MongoDb and Neo4j in a AKS Cluster all deployed via Helm Charts ( standalone for Neo4j, but I'm switch to neo4j-reverse-proxychart) . I have A Certificate issued by Let's Encrypt which is used both for the server and for Neo4j bolt connection. I deployed the cluster in February and it all worked fine, but now when writing to Neo4j it throws the Failed to connect to server error with the Socket responded with: CERT_HAS_EXPIRED reason and Browser does not connect to the db.

Neo4jError: Failed to connect to server. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0. Caused by: Server certificate is not trusted. If you trust the database you are connecting to, use TRUST_CUSTOM_CA_SIGNED_CERTIFICATES and add the signing certificate, or the server certificate, to the list of certificates trusted by this driver using `neo4j.driver(.., { trustedCertificates:['path/to/certificate.crt']}). This  is a security measure to protect against man-in-the-middle attacks. If you are just trying  Neo4j out and are not concerned about encryption, simply disable it using `encrypted="ENCRYPTION_OFF"` in the driver options. Socket responded with: CERT_HAS_EXPIRED
0|server  |     at new Neo4jError (/usr/app/node_modules/neo4j-driver-core/lib/error.js:77:16)
0|server  |     at newError (/usr/app/node_modules/neo4j-driver-core/lib/error.js:113:12)
0|server  |     at NodeChannel._handleConnectionError (/usr/app/node_modules/neo4j-driver-bolt-connection/lib/channel/node/node-channel.js:227:56)
0|server  |     at TLSSocket.<anonymous> (/usr/app/node_modules/neo4j-driver-bolt-connection/lib/channel/node/node-channel.js:69:17)
0|server  |     at Object.onceWrapper (node:events:641:28)
0|server  |     at TLSSocket.emit (node:events:527:28)
0|server  |     at TLSSocket.onConnectSecure (node:_tls_wrap:1564:10)
0|server  |     at TLSSocket.emit (node:events:527:28)
0|server  |     at TLSSocket._finishInit (node:_tls_wrap:945:8)
0|server  |     at ssl.onhandshakedone (node:_tls_wrap:726:12) {
0|server  |   constructor: [Function: Neo4jError] { isRetriable: [Function (anonymous)] },
0|server  |   code: 'ServiceUnavailable',
0|server  |   retriable: true
0|server  | }

The certificate has been renewed automatically in April and my guess is that Neo4j just stuck with the first certificate, is it possible?

this is the Certificate

Name:         tls-certificate
Namespace:    default
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: cluster
              meta.helm.sh/release-namespace: default
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2023-02-15T15:25:52Z
  Generation:          1
  Managed Fields:
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/managed-by:
      f:spec:
        .:
        f:dnsNames:
        f:issuerRef:
          .:
          f:kind:
          f:name:
        f:secretName:
    Manager:      helm
    Operation:    Update
    Time:         2023-02-15T15:25:52Z
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:revision:
    Manager:      cert-manager-certificates-issuing
    Operation:    Update
    Subresource:  status
    Time:         2023-04-16T14:27:13Z
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
          .:
          k:{"type":"Ready"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:observedGeneration:
            f:reason:
            f:status:
            f:type:
        f:notAfter:
        f:notBefore:
        f:renewalTime:
    Manager:         cert-manager-certificates-readiness
    Operation:       Update
    Subresource:     status
    Time:            2023-04-16T14:27:13Z
  Resource Version:  20023818
  UID:               9edc761c-9382-4597-8048-ec5e85d0871d
Spec:
  Dns Names:
    xxx.westeurope.cloudapp.azure.com
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       letsencrypt-issuer
  Secret Name:  tls-secret
Status:
  Conditions:
    Last Transition Time:  2023-02-15T15:26:48Z
    Message:               Certificate is up to date and has not expired
    Observed Generation:   1
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2023-07-15T13:27:11Z
  Not Before:              2023-04-16T13:27:12Z
  Renewal Time:            2023-06-15T13:27:11Z
  Revision:                2
Events:                    <none>

Neo4 ssl settings in the Neo4j Chart's Values :

  ssl:
    # setting per "connector" matching neo4j config
    bolt:
      privateKey:
        secretName: tls-secret
        subPath: tls.key
      publicCertificate:
        secretName: tls-secret
        subPath: tls.crt
      trustedCerts:
        sources: []
      revokedCerts:
        sources: []

After scaling down Neo4j cluster to 0 replicas and up to 1 replicas it all start working again as expected as Neo4j starts with the renewed certificate, but it's there a way to set it up to make this automatic?

Many thanks.

Chart Name

Standalone

Chart Version

4.4.2

Environment

Microsoft Azure

Relevant log output

No response

Code of Conduct

harshitsinghvi22 commented 1 year ago

@vinnytwice the reason for this is the use of subPath in the secrets volumeMount.

As per the kubernetes documentation secrets mounted via subPath do not receive updates

https://kubernetes.io/docs/concepts/configuration/secret/

Note: A container using a Secret as a subPath volume mount does not receive automated Secret updates.

vinnytwice commented 1 year ago

@harshitsinghvi22 Hi and thanks for answering this quick.

As per the kubernetes documentation secrets mounted via subPath do not receive updates

should I use the trusted Certs array instead? something like:

  ssl:
    # setting per "connector" matching neo4j config
    bolt:
      #privateKey:
        #secretName: tls-secret
        #subPath: tls.key
      #publicCertificate:
        #secretName: tls-secret
        #subPath: tls.crt
      trustedCerts:
        sources: 
          - secret:
                 name: tls-secret
                 items:
                 - key: tls.crt
                    path: tls.crt
                 - key: tls.key
                    path: tls.key

      revokedCerts:
        sources: []

Would I use it as the example above or in addiction to privateKey and publicCertificate but omitting both the subpath parameters?

Thank you very much again.

harshitsinghvi22 commented 1 year ago

@vinnytwice i am looking into this however my previous observation seems to be a bit incomplete. It seems even though the subPath was not in place and k8s would update the certificates, Neo4j as a product would still need a restart. Neo4j as of now needs a restart to reflect the new certificates. I am checking more on this with our internal teams and will get back to you once I have some update on this.

To be honest , this might require some engineering effort from our internal product team as this is requested from other customers as well and helm charts will be able to support it only when its supported by product itself.

vinnytwice commented 1 year ago

@harshitsinghvi22 Hi, I see. So it won't pick the renewed cert even if its referenced in the trustedCerts array, correct?

To be honest , this might require some engineering effort from our internal product team as this is requested from other customers as well and helm charts will be able to support it only when its supported by product itself.

Yes, I was expecting as well as other customers, that Neo4j would pick renewed certificates automatically as having to restart it manually is a bit of a tedious job. It's a much needed feature, so I guess it will get fixed soon, you guys are very responsive and I'm glad for that.

I am checking more on this with our internal teams and will get back to you once I have some update on this.

Yes please, keep in the loop on this.

Thank you very much again. Cheers

harshitsinghvi22 commented 1 year ago

@harshitsinghvi22 Hi, I see. So it won't pick the renewed cert even if its referenced in the trustedCerts array, correct?

Trustedcerts wont help here...thats a separate attribute , privateKey and publicCertificate are must and those locations need to be update with the renewed certificate and than a Neo4j restart is required at the moment so that the new certs should get picked

vinnytwice commented 1 year ago

@harshitsinghvi22 Oh I see. So I'll just keep deploying the chart as currently setup and restart the pods until this automatic renewed cert pickup gets sorted. Thank you very much again. Cheers

vinnytwice commented 10 months ago

@harshitsinghvi22 hi, amd happy new year!! do you have any news about the automatic renewed certificates pick-up?

harshitsinghvi22 commented 10 months ago

@vinnytwice happy new year to you too !! Checked with the respective team and unfortunately the feature is not yet scheduled. I will keep this thread updated with the latest info.

adriantr commented 5 months ago

hey! any news on this issue?

benrj commented 3 weeks ago

We'd be interested in this feature as well!

ryanmcafee commented 3 weeks ago

Per the docs here, internally a Netty server is being used to handle the SSL certificate termination and the TCP connection. Would an approach like this work to allow for the ssl cert to be watched and periodically refreshed without requiring a disruptive cluster member restart?