Removed hardcoded default-scheduler

mickeder commented 1 year ago

Signed-off-by: Michal Keder michalkeder@cogniflare.io

Q	A
Bug fix?	yes
New feature?	no
API breaks?	no
Deprecations?	no
Related tickets	N/A
License	Apache 2.0

What's in this PR?

I removed the hardcoded SchedulerName in pod definition, since when no scheduler name is supplied, the pod is automatically scheduled using default-scheduler.

Why?

When trying to deploy a NifiCluster to a GKE Autopilot cluster or a GKE cluster with Autoscaling with optimize-utilization profile, scheduler is changed to gke.io/optimize-utilization-scheduler via a mutating webhook (https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles). This leads to endless recreation of Nifi pods, because the patch result is never empty: https://github.com/konpyutaika/nifikop/blob/389bab8cce20fa57136e567333e07ae25947a71f/pkg/resources/nifi/nifi.go#L654 https://github.com/konpyutaika/nifikop/blob/389bab8cce20fa57136e567333e07ae25947a71f/pkg/resources/nifi/nifi.go#L682-L688 https://github.com/konpyutaika/nifikop/blob/389bab8cce20fa57136e567333e07ae25947a71f/pkg/resources/nifi/nifi.go#L730

Additional context

To test I used a GKE Autopilot cluster with cert-manager, zookeeper and nifikop installed using below commands:

helm install cert-manager jetstack/cert-manager \
    --create-namespace --namespace cert-manager \
    --set installCRDs=true \
    --set global.leaderElection.namespace=cert-manager 
helm install zookeeper bitnami/zookeeper \
    --create-namespace --namespace zookeeper \
    --set resources.requests.memory=256Mi \
    --set resources.requests.cpu=250m \
    --set resources.limits.memory=256Mi \
    --set resources.limits.cpu=250m \
    --set global.storageClass=standard-rwo \
    --set networkPolicy.enabled=true
helm install nifikop oci://ghcr.io/konpyutaika/helm-charts/nifikop \
    --create-namespace --namespace nifi \
    --set image.repository=my.gcr.repo \
    --set image.tag=mytag \
        --set resources.requests.memory=256Mi \
        --set resources.requests.cpu=250m \
        --set resources.limits.memory=256Mi \
        --set resources.limits.cpu=250m \
        --set namespaces={"nifi"}

Then I applied the NifiCluster from [samples] https://github.com/konpyutaika/nifikop/blob/389bab8cce20fa57136e567333e07ae25947a71f/config/samples/simplenificluster.yaml) with StorageClass set to standard-rwo and additional requests and limits for ephmeral-storage (required by GKE Autopilot restrictions):

apiVersion: nifi.konpyutaika.com/v1
kind: NifiCluster
metadata:
  name: simplenifi
spec:
  service:
    headlessEnabled: true
    labels:
      cluster-name: simplenifi
  zkAddress: "zookeeper.zookeeper:2181"
  zkPath: /simplenifi
  externalServices:
    - metadata:
        labels:
          cluster-name: driver-simplenifi
      name: driver-ip
      spec:
        portConfigs:
          - internalListenerName: http
            port: 8080
        type: LoadBalancer
  clusterImage: "apache/nifi:1.15.3"
  initContainerImage: 'bash:5.2.2'
  oneNifiNodePerNode: true
  readOnlyConfig:
    nifiProperties:
      overrideConfigs: |
        nifi.sensitive.props.key=thisIsABadSensitiveKeyPassword
  pod:
    labels:
      cluster-name: simplenifi
  nodeConfigGroups:
    default_group:
      imagePullPolicy: IfNotPresent
      isNode: true
      serviceAccountName: default
      storageConfigs:
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "standard-rwo"
            resources:
              requests:
                storage: 1Gi
      resourcesRequirements:
        limits:
          cpu: "0.5"
          memory: 2Gi
          ephemeral-storage: 4Gi
        requests:
          cpu: "0.5"
          memory: 2Gi
          ephemeral-storage: 4Gi
  nodes:
    - id: 1
      nodeConfigGroup: "default_group"
    - id: 2
      nodeConfigGroup: "default_group"
  propagateLabels: true
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  listenersConfig:
    internalListeners:
      - containerPort: 8080
        type: http
        name: http
      - containerPort: 6007
        type: cluster
        name: cluster
      - containerPort: 10000
        type: s2s
        name: s2s
      - containerPort: 9090
        type: prometheus
        name: prometheus
      - containerPort: 6342
        type: load-balance
        name: load-balance

Checklist

[x] Implementation tested
[ ] Error handling code meets the guideline
[ ] Logging code meets the guideline
[ ] User guide and development docs updated (if needed)
[ ] Append changelog with changes

To Do

[ ] If the PR is not complete but you want to discuss the approach, list what remains to be done here

mh013370 commented 1 year ago

This LGTM. Should we make the scheduler assigned to the pod be configurable? If so, i'm happy to raise a follow on PR to do this.

mh013370 commented 1 year ago

default-scheduler will be the default if it is not specified per the docs, so this LGTM

https://pkg.go.dev/k8s.io/api/core/v1

mh013370 commented 1 year ago

@mickeder : If you allow edits by maintainers on this PR by checking the appropriate checkbox, then I can sign-off on the changes and merge it as this project requires signed commits by all authors.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

mickeder commented 1 year ago

@mh013370 I've just replaced my commit with a signed one.

mh013370 commented 1 year ago

Thank you for the contribution!

konpyutaika / nifikop