stackabletech / nifi-operator

A kubernetes operator for Apache NiFi
Other
28 stars 4 forks source link

Document how to include custom processors in a Stackable NiFi deployment #229

Closed lfrancke closed 1 year ago

lfrancke commented 2 years ago

Discussed in https://github.com/stackabletech/nifi-operator/discussions/225

Originally posted by **Jimvin** March 1, 2022 I would like a way to include custom assets, such as JDBC drivers, libraries and custom NiFi processors in my cluster. I would like to do this in a way that ensures all of the instances in the cluster contain the same assets.
deebify commented 1 year ago

I want to do the same as well. I suggest extending the docker image used by stackable. https://github.com/stackabletech/docker-images/blob/main/nifi/Dockerfile

and publish it in my local docker registry and instruct the NiFi CRDs to use my docker registry image instead.

Is that going to work? Any thoughts?

maltesander commented 1 year ago

I want to do the same as well. I suggest extending the docker image used by stackable. https://github.com/stackabletech/docker-images/blob/main/nifi/Dockerfile

and publish it in my local docker registry and instruct the NiFi CRDs to use my docker registry image instead.

Is that going to work? Any thoughts?

Yes that should work. You could then use image.custom to reference your docker registry (https://docs.stackable.tech/home/stable/concepts/product_image_selection.html#_custom_images) and tag.

lfrancke commented 1 year ago

This might come down to just docs but we should check if there's a more convenient way to do this.

soenkeliebau commented 1 year ago

Ideally I'd like to add something to the NiFi CRD that gives the option of adding custom processors with more convenience.

Something like:

---
apiVersion: nifi.stackable.tech/v1alpha1
kind: NifiCluster
metadata:
  name: simple-nifi
spec:
  clusterConfig:
    customProcessors:
      - fetchUrl:
          url: http://my-company/repo/processor.nar
          checksum: 12345
      - persistentVolumeClaim
          name: my-pvc-containing-processors
      - gitReference
           git: git:// ....

None of this is fully thought through, but I think specifying external stuff may come up more than once, this might be something worth abstracting over as it is basically just a way of saying "I need this on the classpath please".

razvan commented 1 year ago

TL&DR; I believe the future proof and more flexible solution is to use the Nifi registry. But that is a much larger task, so just mount some volumes and their contents to the lib or extensions folders.

Sorry for being late to the party and rewinding a bit on past discussions.

I've spent some time reading the Nifi documentation and I now see this problem in a bigger context: dynamically provision different type of artifacts: jdbc drivers, processors (packaged as NAR files), templates and workflows.

It seems that the Nifi Registry (which the operator doesn't currently manage) is capable of hosting NAR bundles and workflows (1). It's not clear if Nifi clusters can ingest NARs from the registry though but I suspect that they do or at least will in the future.

A Stackable managed Nifi Registry would enable much more robust Nifi cluster upgrades and migrations, would increase the cluster security and would probably be a better bet on the future development of the Stackable operator.

That being said, adding support for managed Nifi registries is a much more complex task than initially requested by this issue.

As a compromise, I propose to implement the most basic functionality for provisioning extensions via PVCs. No URL fetching, no Git sync, etc. We already support the configuration of extraVolumes (2) and we should extend that with a "purpose". Example purposes are:

Another compromise could be to add support for pod overrides. This would require even less code than the previous code and it would be reusable across all operators. The downside would be that provisioning extensions will be extremely verbose and error prone for users.

razvan commented 1 year ago

After thinking more about it, I now tend to believe that no solution is as good as packaging requirements directly in the Nifi image.

Incidentally, the exact same problem has been investigated for Spark applications.

razvan commented 1 year ago

These are the meeting minutes on the topic:

Protocoll

Sebastian: Just document how to build custom images. Andrew: PVCs are not good. Experience with Airflow and Git repositories unsatisfactory. Sönke: has written the issue. Doesn't like if customers have to build their own images. Doesn't like PVCs either. Razvan: Own custom images with the most popular/useful extensions ? Sönke: The original customer has their own processor. For that a "fetch url" solution would be required. BUT the customer is now building a custom image anyway. Razvan: We would need a generic/framework solution for two things: put artifacts into containers and (optionally) apply a task to them (copy to other dirs, update configs and so on). Sönke: A single artifact can be mounted in the existing classpath of Nifi. No sequential actions required.

Outcome

We all agreed on the following steps:

adwk67 commented 1 year ago

Andrew: PVCs are not good. Experience with Airflow and Git repositories unsatisfactory.

I probably didn't express this very clearly. Airflow + gitsync is fine for situations where the product already has a polling mechanism to find resources (as airflow does).