konpyutaika / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://konpyutaika.github.io/nifikop/
Apache License 2.0
131 stars 45 forks source link

Add ability to retain and re-use pod PVCs instead of always recreating them #325

Closed mh013370 closed 9 months ago

mh013370 commented 10 months ago
Q A
Bug fix? no
New feature? yes
API breaks? no
Deprecations? no
Related tickets fixes #277
License Apache 2.0

What's in this PR?

Adds a new optional field called ReclaimPolicy to nifikop's volume StorageConfig. This instructs nifikop whether or not to Delete or Retain PVCs on NifiCluster deletion.

There's a caveat here where nifikop will only re-attach the PVC to the node it was previously attached to. It will not re-attach the PVC to a random node in the case that a NifiCluster is created and then deleted with PVCs configured with Retain reclaim policies. If a PVC is created for node 2, for example, it will only ever belong to node 2.

Example NifiCluster NodeConfig:

default_group:
      # provenanceStorage allow to specify the maximum amount of data provenance information to store at a time
      # https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-provenance-repository-properties
      provenanceStorage: "10 GB"
      runAsUser: 1000
      # Set this to true if the instance is a node in a cluster.
      # https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#basic-cluster-setup
      isNode: true
      # Additional metadata to merge to the pod associated
      podMetadata:
        annotations:
          node-annotation: "node-annotation-value"
        labels:
          node-label: "node-label-value"
      # imagePullPolicy define the pull policy for NiFi cluster docker image
      imagePullPolicy: IfNotPresent
      # priorityClassName define the name of the priority class to be applied to these nodes
      priorityClassName: "example-priority-class-name"
      # externalVolumeConfigs specifies a list of volume to mount into the main container.
      externalVolumeConfigs:
        - name: example-volume
          mountPath: "/opt/nifi/example"
          secret:
            secretName: "raw-controller"
      # storageConfigs specifies the node related configs
      storageConfigs:
        # Name of the storage config, used to name PV to reuse into sidecars for example.
        - name: provenance-repository
          # Retain this PVC throughout NifiCluster deletions.
          reclaimPolicy: Retain
          # Path where the volume will be mount into the main nifi container inside the pod.
          mountPath: "/opt/nifi/provenance_repository"
          # Metadata to attach to the PVC that gets created
          metadata:
            labels:
              my-label: my-value
            annotations:
              my-annotation: my-value
          # Kubernetes PVC spec
          # https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolumeclaim
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "standard"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          reclaimPolicy: Delete
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "standard"
            resources:
              requests:
                storage: 10Gi

Why?

To prevent data loss when a NifiCluster is accidentally deleted. I think we ought to recommend that the reclaim policies be left as the default Delete policy to avoid the potential for leaving PVCs around indefinitely. If a PVC is configured as Retain, nifikop will never delete it. it must be deleted manually or reconfigured as Delete and then delete the NifiCluster.

Checklist

mh013370 commented 10 months ago

This implementation isn't quite complete. All of the objects created under NifiCluster are configured to be owned by the NifiCluster CRD. When that object gets deleted, every object it owns is garbage collected including the PVCs.

To prevent this, we have to release ownership of the Retain PVCs prior to finalizing deletion of the NifiCluster. There are some edge cases here we need to handle when reclaim policies are changed during the NifiCluster lifecycle that i must handle.

Will push changes to address this.

mh013370 commented 10 months ago

I've verified these changes by running the following tests:

  1. Create NifiCluster with Retain PVCs & delete the cluster. Assert that PVCs remain after NifiCluster is deleted. Re-create NifiCluster and assert that PVCs are re-bound to new cluster.

  2. Create NifiCluster with Delete PVCs & delete the cluster. Assert that PVCs are deleted when NifiCluster is deleted.

  3. Create NifiCluster with Retain PVCs, change reclaim policy to Delete, then delete the cluster. Assert that PVCs are deleted when cluster is deleted.

  4. Create NifiCluster with Delete PVCs, change reclaim policy to Retain, then delete the cluster. Assert that PVCs remain after NifiCluster is deleted. Re-create NifiCluster and assert that PVCs are re-bound to new cluster.

  5. Create NifiCluster with Retain PVCs. Scale cluster up by 1 node & then scale cluster down by 1. Ensure that new node's PVCs are retained. Scale cluster back up by 1 and ensure that PVC is re-bound to new pod.

  6. Create NifiCluster with Delete PVCs. Scale cluster up by 1 & then scale cluster down by 1. Ensure that the new node's PVCs are deleted.

All test cases were successful.