piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
361 stars 57 forks source link

Problem with backups to s3 with untrusted CA #453

Open andlf opened 1 year ago

andlf commented 1 year ago

Hello! I need to backup volumesnapshots to S3 storage, https with my company ca cert and have error:

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.20.3
Build ID:                           8d19a891df018f6e3d40538d809904f024bfe361
Build time:                         2023-01-26T08:40:26+00:00
Error time:                         2023-04-06 13:50:42
Node:                               linstor-controller-7b4d64bbf9-v4zxj
Peer:                               RestClient(10.111.1.177; 'linstor-csi/v0.22.1-8c8ead5a56ed812dcac8e89a67e51c2c89c30787')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         SdkClientException
Class canonical name:               com.amazonaws.SdkClientException
Generated at:                       Method 'handleRetryableException', Source file 'AmazonHttpClient.java', Line #1216

Error message:                      Unable to execute HTTP request: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

How i can get CA cert trusted for controller? I need to mount somewhere secret with ca cert, and i will have problems with java security :( Thanks!

WanzenBug commented 1 year ago

It's currently a little bit cumbersome, but if you use Operator v2, you can try something like this:

---
apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  patches:
    - target:
        kind: Deployment
        name: linstor-controller
      patch: |
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: linstor-controller
        spec:
          template:
            spec:
              initContainers:
              - name: ca-cert-patcher
                image: linstor-controller
                command:
                - sh
                - -exc
                - |
                  cp $(dirname $(readlink -f $(command -v java)))/../lib/security/cacerts /ca-certs/cacerts
                  keytool -importcert -noprompt -deststorepass changeit -keypass changeit -file /extra-ca/ca.crt -alias extra-ca -destkeystore /ca-certs/cacerts
                volumeMounts:
                  - name: ca-certs
                    mountPath: /ca-certs
                  - name: extra-ca
                    mountPath: /extra-ca
                    readOnly: true
              containers:
              - name: linstor-controller
                env:
                  - name: JAVA_OPTS
                    value: -Djdk.tls.acknowledgeCloseNotify=true -Djavax.net.ssl.trustStore=/ca-certs/cacerts
                volumeMounts:
                  - name: ca-certs
                    mountPath: /ca-certs
                    readOnly: true
              volumes:
                - name: ca-certs
                  emptyDir: {}
                - name: extra-ca
                  secret:
                    secretName: extra-ca-tls

You need your company CA certificate in a secret extra-ca-tls using the key ca.crt

andlf commented 1 year ago

Great thanks! Your path solvev problem with ca cert! I have now some another errors with backup, will write result when backups wil work fine

andlf commented 1 year ago

Hi. Looks like we need patch for sattelites pods, not only for deployment linstor-controller...

 64402A2F-A94B3-000008 | 2023-04-20 07:20:07 | S|k03                                 | SystemServiceStartException: Unable to daemon for SnapshotShipping              |
| 64402A2F-A94B3-000009 | 2023-04-20 07:20:24 | S|k03                                 | SystemServiceStartException: Unable to daemon for SnapshotShipping              |
| 64402A64-6CEFF-000004 | 2023-04-20 07:20:38 | S|linstor01                           | SystemServiceStartException: Unable to daemon for SnapshotShipping              |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
~$linstor error-reports s 64402A2F-A94B3-000008
ERROR REPORT 64402A2F-A94B3-000008

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.20.3
Build ID:                           8d19a891df018f6e3d40538d809904f024bfe361
Build time:                         2023-01-26T08:40:26+00:00
Error time:                         2023-04-20 07:20:07
Node:                               k03

============================================================

Reported error:
===============

Description:
    Amazon exception attempting to start '[setsid, -w, bash, -c, set -o pipefail; trap 'kill -HUP 0' SIGTERM; (thin_send vg0/pvc-8386d8f2-bc7f-4887-9546-048e7c5d230b_00000_snapshot-5f7af852-5536-4580-b13b-4c12707e17c4 | zstd;)&\wait $!]'
Cause:
    Unable to execute HTTP request: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
andlf commented 1 year ago

Greate thanks again!)) I have applied:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: ca-patch
spec:
  patches:
    - target:
        kind: Pod
        name: satellite
      patch: |
        apiVersion: v1
        kind: Pod
        metadata:
          name: satellite
        spec:
          initContainers:
          - name: ca-cert-patcher
            image: quay.io/piraeusdatastore/piraeus-server:v1.20.3
            command:
... etc

and it works. i have sometimes errors with velero timeout like

msg="fail to wait VolumeSnapshot change to Ready: timed out waiting for the condition"

but looks like velero misconfiguration