konpyutaika / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://konpyutaika.github.io/nifikop/
Apache License 2.0
128 stars 44 forks source link

Improve the pod health checks to monitor cluster status #167

Closed r65535 closed 1 year ago

r65535 commented 1 year ago

What steps will reproduce the bug?

If a node disconnects from a nifi cluster, it'll stay disconnected until I manually delete the pod. I get this error regularly:

Action cannot be performed because there is currently no Cluster Coordinator elected. The request should be tried again after a moment, after a Cluster Coordinator has been automatically elected.

What is the expected behavior?

The pod should be restarted to rejoin the cluster

What do you see instead?

Disconnected nodes that don't recover

Possible solution

Change the pod readiness check to hit /nifi-api/flow/cluster/summary?

NiFiKop version

v0.14.0-release

Golang version

1.19

Kubernetes version

v1.23.6-rke2r2

NiFi version

1.16.0

Additional context

No response

mh013370 commented 1 year ago

How does the /nifi-api/flow/cluster/summary endpoint respond for standalone nodes?

r65535 commented 1 year ago

How does the /nifi-api/flow/cluster/summary endpoint respond for standalone nodes?

I hadn't thought of that! The above endpoint isn't available to standalone clusters - so maybe an if statement here that uses the above URI if it's clustered, or leaves it as-is if not?

mh013370 commented 1 year ago

That sounds reasonable!

mh013370 commented 1 year ago

I've opened #219 as i believe that's a more general solution that can be tailored to specific use cases.

r65535 commented 1 year ago

Closing as #219 is a better solution IMO!