konpyutaika / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://konpyutaika.github.io/nifikop/
Apache License 2.0
133 stars 45 forks source link

Flows do not persist pod restart #70

Closed juldrixx closed 2 years ago

juldrixx commented 2 years ago

From nifikop created by andrew-musoke: Orange-OpenSource/nifikop#201

Type of question

Are you asking about community best practices, how to implement a specific feature, or about general context and help around nifikop ? General help with Nifikop.

Question

What did you do? I deployed Nifi with 2 pods via NifiKops. After creating a flow on the UI, I exported the process groups to a nifi-registry as well. The cluster run for days. This is the CR I used. I then deleted the cluster pods to test resilience.

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: simplenifi
  namespace: dataops
spec:
  service:
    headlessEnabled: true
  zkAddress: "zookeeper.dataops.svc.cluster.local.:2181"
  zkPath: "/simplenifi"
  clusterImage: "apache/nifi:1.12.1"
  oneNifiNodePerNode: false
  nodeConfigGroups:
    default_group:
      isNode: true
      imagePullPolicy: IfNotPresent
      storageConfigs:
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "gp2"
            resources:
              requests:
                storage: 10Gi
      serviceAccountName: "default"
      resourcesRequirements:
        limits:
          cpu: "0.5"
          memory: 2Gi
        requests:
          cpu: "0.5"
          memory: 2Gi
  clientType: "basic"
  nodes:
    - id: 1
      nodeConfigGroup: "default_group"
    - id: 2
      nodeConfigGroup: "default_group"
  propagateLabels: true
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  listenersConfig:
    internalListeners:
      - type: "http"
        name: "http"
        containerPort: 8080
      - type: "cluster"
        name: "cluster"
        containerPort: 6007
      - type: "s2s"
        name: "s2s"
        containerPort: 10000

What did you expect to see? I expected the cluster to run properly and survive restarts since PVs are created. I expected to see the pipelines continue running after the pods started up.

What did you see instead? Under which circumstances? When the pods came back up and were healthy, the UI had no flows or process groups. The registry configuration had also disappeared. I have to manually re-register the nifi-registry, re-import the process groups, add the secrets and restart the pipelines.

  1. Why would this happen when Nifi has persistent volumes?
  2. How can this behaviour be stopped?
  3. How can I persist the flows or at least automate the re-importing and restarting of pipelines from nifi-registry.

Environment

apache/nifi:1.12.1

juldrixx commented 2 years ago

I got this response from one of the alternate communication channels. But I cannot make sense of it. Could this be an issue?

It sounds like the flow.xml.gz is perhaps not saved on a persistent volume? The ideal behavior would be to have several different persistent volumes:

  • One for content repo
  • One for flowfile repo
  • One for provenance repo
  • One for logs
  • One for conf/ directory, any additional configuration resources. (this could easily be combined with the logs/ volume)
juldrixx commented 2 years ago

I recently opened a PR that provides an option to NifiClusterSpec where, when specified, does not remove the flow.xml.gz file on pod startup.

In the current implementation, even though the flows.xml.gz file is persisted, it is removed every time the pod starts. https://github.com/Orange-OpenSource/nifikop/blob/master/pkg/resources/nifi/pod.go#L418

juldrixx commented 2 years ago

You should deploy a NiFiDataflow so that NiFiKOp re-deploys the versioned dataflow from NiFi Registry.

https://orange-opensource.github.io/nifikop/docs/5_references/5_nifi_dataflow

I could be wrong, but I suppose you could also make sure the flow.xml.gz is persisted on a persistent volume but it's not necessary if you deploy a NiFiDataflow since nifikop will just put it back once the pod comes up.

mh013370 commented 2 years ago

For production clusters where you've configured nifikop to deploy flows to, this isn't really a problem. However, I do think this would be a useful feature for the following reason:

If you use a single cluster deployment as a place to create flows and version control them, then you wouldn't be configuring flows to be deployed to it. Since nifikop wipes the flow.xml.gz on each pod restart, you have to manually re-import all of the flows you are working on to be deployed to other clusters.

I personally feel that the PR previously mentioned, raised by @genehynson, would be a useful feature and should be re-opened in this repo.

genehynson commented 2 years ago

After upgrading to NiFi 1.16 we are no longer running into this issue. I believe this is because NiFi migrated to a new file, flow.json.gz which is not deleted by the NiFi pod startup script provided by nifikop.

Also with NiFi 1.16 we've been able to do clean, rolling upgrades by creating a PodDisruptionBudget and only allowing 1 NiFi node to be updated by k8s at a time. NiFi 1.16 introduced a new "flow negotiation" system that allows for each node in the NiFi cluster to have slightly different versions of the flow.json.gz file (like different processor versions, for example).

So even if nifikop does start deleting the flow.json.gz file I think we'll be fine because when a NiFi pod rolls it will get the contents for the flow.json.gz from the primary NiFi node that has not rolled yet (or has already rolled).

So that being said, the usecase for the PR mentioned is only if you're running 1 NiFi node or are running an older version of NiFi.

mh013370 commented 2 years ago

Good to know! Thanks for the follow up. I do think that NiFi is writing both the flow.xml.gz and the flow.json.gz temporarily as they transition to the json variant. But it's good to know that with 1.16+ and the changes around flow negotiation that it's a minor issue.

Maybe we can resolve this issue then?

genehynson commented 2 years ago

I do think that NiFi is writing both the flow.xml.gz and the flow.json.gz temporarily as they transition to the json variant

Correct, but it only uses one of them. Whichever you have defined in nifi.flow.configuration.file (flow.xml.gz is the default). And to get the benefits of the new flow negotiation stuff you have to switch to the flow.json.gz file.

That being said, I'm also fine with resolving this issue.

erdrix commented 2 years ago

The flow.xml.gz is not removed anymore at pod restart !