konpyutaika / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://konpyutaika.github.io/nifikop/
Apache License 2.0
122 stars 39 forks source link

Add a new resource to deploy extensions (custom processor) into nifi #396

Closed ggerla closed 1 month ago

ggerla commented 4 months ago

Is your feature request related to a problem?

NiFi allows to add custom processor copying nar files into /opt/nifi/nifi-current/extensions. Of course this is not possible if NiFi is deployed into a container/cluster.

Describe the solution you'd like to see

I wrote a script that using Kubernetes API copy nar file into all pods of the cluster. Of course at the pod restart the file is lost. A possible solution can be to fetch the file from some resource during the init container runs and copy it into the pod. Another solution can be mount a configmap in the desired folder, but this will introduce limits on the nar file size.

Describe alternatives you've considered

No response

Additional context

No response

mh013370 commented 4 months ago

This is an interesting idea, but I hesitate to make it a custom resource given the number of ways & places nars can be hosted.

Nars could be hosted via a standard web service, S3, a maven repository, a database, etc. Each of which have their own access semantics. It's easy enough to add an init container to have the nars copied to a directory that lives in a PVC that survives pod restarts so it only needs copied once. Does this need a new resource to accomplish?

mh013370 commented 4 months ago

Another thing to consider is that NiFi will eventually support pulling extensions (nars) via NiFi Registry: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#bundle-persistence-providers

Stateless NiFi already behaves this way: https://github.com/apache/nifi/blob/main/nifi-stateless/nifi-stateless-assembly/README.md

When Stateless NiFi is started, it parses the provided dataflow and determines which bundles/extensions are necessary to run the dataflow. If an extension is not available, or the version referenced by the flow is not available, Stateless may attempt to download the extensions automatically.

This seems like a feature that should be implemented in NiFi rather than this operator

ggerla commented 4 months ago

Thank you for your answer. The 2 links you posted seems interesting... I will try to investigate this way. In general it is not clear to me if this solve also the second issue I have that is related to lib driver (i.e. postgres jdbc driver)

ggerla commented 4 months ago

I analyzed your suggestion and I understood that NiFi registry is an subproject that should be installed "additionally" to NiFi. Does this operator support also the registry installation?

mh013370 commented 4 months ago

I analyzed your suggestion and I understood that NiFi registry is an subproject that should be installed "additionally" to NiFi. Does this operator support also the registry installation?

Nifikop supports installing NiFi and not the NiFi Registry application, but there exist helm charts for that. I want to clarify that automatically pulling extensions via NiFi Registry in core NiFi is not currently supported, but it is meant to eventually.

In the meantime, you can solve this problem by using an init container. For example:

              initContainers:
                - name: pull-extensions
                  image: d3fk/s3cmd:stable
                  imagePullPolicy: Always
                  workingDir: /extensions
                  command:
                    - "/bin/sh"
                    - "-c"
                    - |
                      s3cmd sync s3://my-bucket/my-nar.nar ./
                  volumeMounts:
                    - name: extensions
                      mountPath: /extensions

And in this case, the extensions volume is a PVC so it will survive pod restarts. This is the current recommended way to solve this problem.

ggerla commented 4 months ago

Thanks basically you wrote exactly what I have in mind, an init container that use an s3 client to download the nar and copy it in a volume of the nifi pod. Now Let's come back to my original question. The need has 2 phase.

  1. add a new extension at runtime
  2. remember all extensions added

the init container satisfy the second phase, but to enable the first I need a way to copy the nar file into the s3 bucket and into the volume without restart the nifi pods. This is because I asked for a new custom resource.

mh013370 commented 4 months ago

Okay, so this is all declarative configuration for a deployment. If you want a new nar in your deployment, you need to add it to an init container. Deployed pods are immutable and so if you want to change them, you must change the deployment configuration.

If i write a custom nar, then part of its release process is to push it to an S3 bucket. I'd then go update my nifi deployment to pull the new nar via the init container. At that point i'm free to use it in NiFi

ggerla commented 4 months ago

sorry I'm not sure we are 100% aligned. Each NiFi pod has its own volume attached (as it is now without change). In this volume there is a folder /opt/nifi/nifi-current/extensions. Now suppose to write a new CRD called nifi-extension (just as example). When this resource is deployed into k8s the operator download the nar from the s3 object store and copy it into the /opt/nifi/nifi-current/extensions path of each nifi pod. This will enable the run time deployment of the nar. Then after a pod restart the init container can check the list of nifi-extension resources and download them from s3 bucket and copy them into the /opt/nifi/nifi-current/extensions path of its own nifi pod.

Do you agree?

mh013370 commented 4 months ago

sorry I'm not sure we are 100% aligned. Each NiFi pod has its own volume attached (as it is now without change). In this volume there is a folder /opt/nifi/nifi-current/extensions. Now suppose to write a new CRD called nifi-extension (just as example). When this resource is deployed into k8s the operator download the nar from the s3 object store and copy it into the /opt/nifi/nifi-current/extensions path of each nifi pod. This will enable the run time deployment of the nar. Then after a pod restart the init container can check the list of nifi-extension resources and download them from s3 bucket and copy them into the /opt/nifi/nifi-current/extensions path of its own nifi pod.

Do you agree?

You can accomplish this with an initContainer alone. initContainers run on every pod restart. You won't need a custom resource for that.

ggerla commented 4 months ago

ok understood thanks for your support