Closed lfrancke closed 1 year ago
I want to do the same as well. I suggest extending the docker image used by stackable. https://github.com/stackabletech/docker-images/blob/main/nifi/Dockerfile
and publish it in my local docker registry and instruct the NiFi CRDs to use my docker registry image instead.
Is that going to work? Any thoughts?
I want to do the same as well. I suggest extending the docker image used by stackable. https://github.com/stackabletech/docker-images/blob/main/nifi/Dockerfile
and publish it in my local docker registry and instruct the NiFi CRDs to use my docker registry image instead.
Is that going to work? Any thoughts?
Yes that should work. You could then use image.custom
to reference your docker registry (https://docs.stackable.tech/home/stable/concepts/product_image_selection.html#_custom_images) and tag.
This might come down to just docs but we should check if there's a more convenient way to do this.
Ideally I'd like to add something to the NiFi CRD that gives the option of adding custom processors with more convenience.
Something like:
---
apiVersion: nifi.stackable.tech/v1alpha1
kind: NifiCluster
metadata:
name: simple-nifi
spec:
clusterConfig:
customProcessors:
- fetchUrl:
url: http://my-company/repo/processor.nar
checksum: 12345
- persistentVolumeClaim
name: my-pvc-containing-processors
- gitReference
git: git:// ....
None of this is fully thought through, but I think specifying external stuff may come up more than once, this might be something worth abstracting over as it is basically just a way of saying "I need this on the classpath please".
TL&DR; I believe the future proof and more flexible solution is to use the Nifi registry. But that is a much larger task, so just mount some volumes and their contents to the lib
or extensions
folders.
Sorry for being late to the party and rewinding a bit on past discussions.
I've spent some time reading the Nifi documentation and I now see this problem in a bigger context: dynamically provision different type of artifacts: jdbc drivers, processors (packaged as NAR files), templates and workflows.
It seems that the Nifi Registry (which the operator doesn't currently manage) is capable of hosting NAR bundles and workflows (1). It's not clear if Nifi clusters can ingest NARs from the registry though but I suspect that they do or at least will in the future.
A Stackable managed Nifi Registry would enable much more robust Nifi cluster upgrades and migrations, would increase the cluster security and would probably be a better bet on the future development of the Stackable operator.
That being said, adding support for managed Nifi registries is a much more complex task than initially requested by this issue.
As a compromise, I propose to implement the most basic functionality for provisioning extensions via PVCs. No URL fetching, no Git sync, etc. We already support the configuration of extraVolume
s (2) and we should extend that with a "purpose". Example purposes are:
Another compromise could be to add support for pod overrides. This would require even less code than the previous code and it would be reusable across all operators. The downside would be that provisioning extensions will be extremely verbose and error prone for users.
After thinking more about it, I now tend to believe that no solution is as good as packaging requirements directly in the Nifi image.
Incidentally, the exact same problem has been investigated for Spark applications.
These are the meeting minutes on the topic:
Sebastian: Just document how to build custom images. Andrew: PVCs are not good. Experience with Airflow and Git repositories unsatisfactory. Sönke: has written the issue. Doesn't like if customers have to build their own images. Doesn't like PVCs either. Razvan: Own custom images with the most popular/useful extensions ? Sönke: The original customer has their own processor. For that a "fetch url" solution would be required. BUT the customer is now building a custom image anyway. Razvan: We would need a generic/framework solution for two things: put artifacts into containers and (optionally) apply a task to them (copy to other dirs, update configs and so on). Sönke: A single artifact can be mounted in the existing classpath of Nifi. No sequential actions required.
We all agreed on the following steps:
Andrew: PVCs are not good. Experience with Airflow and Git repositories unsatisfactory.
I probably didn't express this very clearly. Airflow + gitsync is fine for situations where the product already has a polling mechanism to find resources (as airflow does).
Discussed in https://github.com/stackabletech/nifi-operator/discussions/225