konpyutaika / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://konpyutaika.github.io/nifikop/
Apache License 2.0
126 stars 44 forks source link

nifikop crashes with HPA Autoscaler Enabled #224

Closed skpathak2 closed 1 year ago

skpathak2 commented 1 year ago

What steps will reproduce the bug?

  1. Environment Google Kubernetes Engine 1.24.7-gke.900 Nifikop 1.0.0 nificluster 1.17.0

  2. Assign node labels NifiCluster.spec.nodes

    nodes: 
    - id: 1
      labels:
        default-scale-group: "true"
      nodeConfigGroup: "default-group"
    - id: 2
      labels:
        default-scale-group: "true"
      nodeConfigGroup: "default-group"
  3. Enable HPA autoscaler

    nodeGroupAutoscalers:
    - name: default-group-autoscaler
    enabled: true
    nodeConfigGroupId: default-group
    readOnlyConfig: {}
    nodeConfig: {}
    nodeLabelsSelector:
      matchLabels:
        default-scale-group: "true"
    upscaleStrategy: simple
    downscaleStrategy: lifo
    horizontalAutoscaler:
      maxReplicas: 6
      minReplicas: 1
      replicas: 1
      scaleTargetRef:
        apiVersion: nifi.konpyutaika.com/v1alpha1
        kind: NifiNodeGroupAutoscaler
        name: default-group-autoscaler
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 1

What is the expected behavior?

Nifi nodes should autoscale seamlessly. PS:- I have tried scaling up and down both yields same error.

What do you see instead?

nifikop operator fails with "msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference"

{"level":"info","time":"2023-01-13T02:48:16.888Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:148","msg":"Removing 2 nodes from cluster nifi-cluster spec.nodes configuration for node group default-group"}
{"level":"info","time":"2023-01-13T02:48:16.888Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:208","msg":"Using LIFO downscale strategy for cluster nifi-cluster node group default-group"}
{"level":"info","time":"2023-01-13T02:48:16.888Z","caller":"controller/controller.go:117","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"nifinodegroupautoscaler","controllerGroup":"nifi.konpyutaika.com","controllerKind":"NifiNodeGroupAutoscaler","nifiNodeGroupAutoscaler":{"name":"nifi-cluster-default-group","namespace":"nifi"},"namespace":"nifi","name":"nifi-cluster-default-group","reconcileID":"1645bb85-5ff9-47eb-99de-2ff3de7c2898"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
 panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb65306]
goroutine 914 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1716ce0, 0x274ec30})
 /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/konpyutaika/nifikop/api/v1.(*NifiCluster).GetCreationTimeOrderedNodes(0xc0001a1400)
 /workspace/api/v1/nificluster_types.go:829 +0x126
github.com/konpyutaika/nifikop/pkg/autoscale.(*LIFOHorizontalDownscaleStrategy).ScaleDown(0xc0008a93c0, 0x2)
 /workspace/pkg/autoscale/strategy.go:36 +0x2c
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).scaleDown(0xc000a28540, 0xc000901ba0, 0xc0001a1400, 0x0?)
 /workspace/controllers/nifinodegroupautoscaler_controller.go:214 +0x139
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).Reconcile(0xc000a28540, {0x1bc2718, 0xc00090d380}, {{{0xc0006be980?, 0x10?}, {0xc0005ba440?, 0x40dae7?}}})
 /workspace/controllers/nifinodegroupautoscaler_controller.go:150 +0x785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1bc2670?, {0x1bc2718?, 0xc00090d380?}, {{{0xc0006be980?, 0x18525c0?}, {0xc0005ba440?, 0x4045d4?}}})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000592be0, {0x1bc2670, 0xc00076cbc0}, {0x177e640?, 0xc000b45580?})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000592be0, {0x1bc2670, 0xc00076cbc0})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:230 +0x333

Possible solution

Seems like the error is stemming from GetCreationTimeOrderedNodes() method.

Error

{"level":"info","time":"2023-01-13T05:42:44.135Z","logger":"controllers.NifiNodeGroupAutoscaler","caller":"controllers/nifinodegroupautoscaler_controller.go:208","msg":"Using LIFO downscale strategy for cluster nifi-cluster node group nifi-cluster-default-group"}
{"level":"info","time":"2023-01-13T05:42:44.135Z","caller":"controller/controller.go:117","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"nifinodegroupautoscaler","controllerGroup":"nifi.konpyutaika.com","controllerKind":"NifiNodeGroupAutoscaler","nifiNodeGroupAutoscaler":{"name":"nifi-cluster-nifi-cluster-default-group","namespace":"nifi"},"namespace":"nifi","name":"nifi-cluster-nifi-cluster-default-group","reconcileID":"9667fe67-66b4-4dbd-be09-6605eba0c947"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
 panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb65306]
goroutine 835 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1716ce0, 0x274ec30})
 /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/konpyutaika/nifikop/api/v1.(*NifiCluster).GetCreationTimeOrderedNodes(0xc00092ca00)
 /workspace/api/v1/nificluster_types.go:829 +0x126
github.com/konpyutaika/nifikop/pkg/autoscale.(*LIFOHorizontalDownscaleStrategy).ScaleDown(0xc000d0f3c0, 0x4)
 /workspace/pkg/autoscale/strategy.go:36 +0x2c
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).scaleDown(0xc0002a0540, 0xc000bb0b60, 0xc00092ca00, 0x0?)
 /workspace/controllers/nifinodegroupautoscaler_controller.go:214 +0x139
github.com/konpyutaika/nifikop/controllers.(*NifiNodeGroupAutoscalerReconciler).Reconcile(0xc0002a0540, {0x1bc2718, 0xc000a793b0}, {{{0xc000ac7da8?, 0x10?}, {0xc000bb2b10?, 0x40dae7?}}})
 /workspace/controllers/nifinodegroupautoscaler_controller.go:150 +0x785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1bc2670?, {0x1bc2718?, 0xc000a793b0?}, {{{0xc000ac7da8?, 0x18525c0?}, {0xc000bb2b10?, 0x4045d4?}}})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00015ad20, {0x1bc2670, 0xc000133e00}, {0x177e640?, 0xc0008615e0?})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00015ad20, {0x1bc2670, 0xc000133e00})
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:230 +0x333

NiFiKop version

v.1.0.0

Golang version

go.1.19

Kubernetes version

1.24

NiFi version

1.17.0

Additional context

NA

skpathak2 commented 1 year ago

Seems like the issue is with the helm chart as It's not possible to deploy a NifiCluster with only autoscaled node groups. The NifiCluster CRD requires that you specify at least one node in the spec.nodes list.

I disabled the autoscaling in the helm chart and did it manually using separate deployment. It worked flawlessly will orchestrate this using TF

mh013370 commented 1 year ago

It just occurred to me that i mentioned this as a constraint when i raised the PR to contribute this feature: #89

It's not possible to deploy a NifiCluster with only autoscaled node groups. The NifiCluster CRD requires that you specify at least one node in the spec.nodes list. Do we want to support this? If so, we may need to adjust the cluster initialization logic in the NifiCluster controller.

This could be something we evaluate changing. At the very least this constraint needs to be in the documentation.