netscaler / netscaler-k8s-ingress-controller

NetScaler Ingress Controller for Kubernetes:
https://developer-docs.citrix.com/projects/citrix-k8s-ingress-controller/en/latest/
308 stars 91 forks source link

Service Group IP not updating after an application rolled out when NS_SNIPS used #635

Closed arasyor closed 6 months ago

arasyor commented 7 months ago

Service Group IP's not updating with new pod IPs when application rolled out on k8s ingress controller installed OpenShift cluster with NS_SNIPS option.

OpenShift version 4.12.34 with OpenShift SDN CNI.

netscaler-k8s-ingress-controller operator object below. Ingress Controller image tag 1.39.6 is used.

kind: NetscalerIngressController
apiVersion: netscaler.com/v1
metadata:
  name: nsic
  namespace: netscaler-ingress-controller
spec:
  adcCredentialSecret: nslogin-local
  affinity: {}
  analyticsConfig:
    distributedTracing:
      enable: false
      samplingrate: 100
    endpoint:
      server: ''
      service: ''
    required: false
    timeseries:
      auditlogs:
        enable: false
      events:
        enable: false
      metrics:
        enable: false
        enableNativeScrape: false
        exportFrequency: 30
        mode: avro
        schemaFile: schema.json
      port: 30002
    transactions:
      enable: false
      port: 30001
  clusterName: ocptstinf01
  crds:
    install: true
    retainOnDelete: false
  defaultSSLCertSecret: nsic-tst-cert
  disableAPIServerCertVerify: false
  disableOpenshiftRoutes: false
  entityPrefix: openshift
  exporter:
    extraVolumeMounts: []
    image: >-
      {{ .Values.exporter.imageRegistry }}/{{ .Values.exporter.imageRepository
      }}:{{ .Values.exporter.imageTag }}
    imageRegistry: quay.io
    imageRepository: netscaler/netscaler-adc-metrics-exporter
    imageTag: 1.4.9
    ports:
      containerPort: 8888
    pullPolicy: IfNotPresent
    required: false
    resources: {}
    serviceMonitorExtraLabels: {}
  extraVolumeMounts: []
  extraVolumes: []
  fullnameOverride: ''
  ignoreNodeExternalIP: false
  image: >-
    {{ .Values.imageRegistry }}/{{ .Values.imageRepository }}:{{
    .Values.imageTag }}
  imagePullSecrets: []
  imageRegistry: quay.io
  imageRepository: netscaler/netscaler-k8s-ingress-controller
  imageTag: 1.39.6
  ingressClass:
    - netscaler
  ipam: false
  jsonLog: false
  kubernetesURL: ''
  license:
    accept: 'yes'
  logLevel: INFO
  logProxy: ''
  nameOverride: ''
  namespaceLabels: ''
  nitroReadTimeout: 20
  nodeSelector:
    key: node-role.kubernetes.io/infra
    value: ''
  nodeWatch: true
  nsConfigDnsRec: false
  nsCookieVersion: '0'
  nsDnsNameserver: ''
  nsEnableLabel: true
  nsHTTP2ServerSide: 'OFF'
  nsIP: 10.81.22.10
  nsLbHashAlgo:
    hashAlgorithm: DEFAULT
    hashFingers: 256
    required: false
  nsPort: 443
  nsProtocol: HTTPS
  nsSNIPS: '["10.79.94.56"]'
  nsSvcLbDnsRec: false
  nsVIP: 10.79.94.55
  nsncPbr: false
  openshift: true
  optimizeEndpointBinding: false
  podAnnotations: {}
  podIPsforServiceGroupMembers: false
  profileHttpFrontend: {}
  profileSslFrontend: {}
  profileTcpFrontend: {}
  pullPolicy: IfNotPresent
  rbacRole: false
  resources:
    limits: {}
    requests:
      cpu: 32m
      memory: 128Mi
  routeLabels: 'netscaler-ingress-controller=true'
  secretStore:
    enabled: false
    password: {}
    username: {}
  serviceAccount:
    create: true
  serviceClass: []
  setAsDefaultIngressClass: false
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/infra
    operator: Exists
  updateIngressStatus: true

OpenShift route object below.

kind: Route
metadata:
  labels:
    netscaler-ingress-controller: "true"
  name: python-test-app-nsic
  namespace: opensol-dev-infra
spec:
  host: python-test-app.nsic.ocptst.local
  port:
    targetPort: 8080
  to:
    kind: Service
    name: python-test-app
    weight: 100
  wildcardPolicy: None

Netscaler LB vserver after route object created below.

> show lb vserver openshift-python-test-app_8080_lbv_57fkxi7vr25aap47oq2vsb4jo2dm3yvr
        openshift-python-test-app_8080_lbv_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (0.0.0.0:0) - HTTP  Type: ADDRESS 
        State: UP
        Last state change was at Mon Feb 26 14:11:08 2024
        Time since last state change: 0 days, 00:00:04.130
        Effective State: UP  ARP:DISABLED
        Client Idle Timeout: 180 sec
        Down state flush: ENABLED
        Disable Primary Vserver On Down : DISABLED
        Comment: "rv:2116328244,ing:python-test-app-nsic,ingport:80,ns:opensol-dev-infra,svc:python-test-app,svcport:8080"
        Appflow logging: ENABLED
        Port Rewrite : DISABLED
        No. of Bound Services :  2 (Total)       1 (Active)
        Configured Method: LEASTCONNECTION
        Current Method: Round Robin, Reason: Bound service's state changed to UP        BackupMethod: ROUNDROBIN
        Mode: IP
        Persistence: NONE
        Vserver IP and Port insertion: OFF 
        Push: DISABLED  Push VServer: 
        Push Multi Clients: NO
        Push Label Rule: none
        L2Conn: OFF
        Skip Persistency: None
        Listen Policy: NONE
        IcmpResponse: PASSIVE
        RHIstate: PASSIVE
        New Service Startup Request Rate: 0 PER_SECOND, Increment Interval: 0
        Mac mode Retain Vlan: DISABLED
        DBS_LB: DISABLED
        Process Local: DISABLED
        Traffic Domain: 0
        TROFS Persistence honored: ENABLED
        Retain Connections on Cluster: NO
        Order Sequence: ASCENDING
        Current Active Order: None

Bound Service Groups:
1)      Group Name: openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr

                1) openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (10.90.19.155: 8080) - HTTP State: UP    Weight: 1 Order: None
                2) openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (10.90.19.22: 8080) - HTTP State: DOWN   Weight: 1 Order: None
 Done

Pod IPs of deployment below.

NAME                               READY   STATUS    RESTARTS   AGE   IP          
python-test-app-6556f98896-dg98z   1/1     Running   0          40m   10.90.19.155

After deployment rolled out, new pods are created with new IPs but Service Group IPs not changed and vserver down.

NAME                               READY   STATUS        RESTARTS   AGE   IP          
python-test-app-6556f98896-dg98z   1/1     Terminating   0          41m   10.90.19.155
python-test-app-7cc45d498b-vdgc7   1/1     Running       0          3s    10.90.19.157
> show lb vserver openshift-python-test-app_8080_lbv_57fkxi7vr25aap47oq2vsb4jo2dm3yvr
        openshift-python-test-app_8080_lbv_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (0.0.0.0:0) - HTTP  Type: ADDRESS 
        State: DOWN
        Last state change was at Mon Feb 26 14:12:26 2024
        Time since last state change: 0 days, 00:02:41.770
        Effective State: DOWN  ARP:DISABLED
        Client Idle Timeout: 180 sec
        Down state flush: ENABLED
        Disable Primary Vserver On Down : DISABLED
        Comment: "rv:2116328244,ing:python-test-app-nsic,ingport:80,ns:opensol-dev-infra,svc:python-test-app,svcport:8080"
        Appflow logging: ENABLED
        Port Rewrite : DISABLED
        No. of Bound Services :  2 (Total)       0 (Active)
        Configured Method: LEASTCONNECTION
        Current Method: Round Robin, Reason: Bound service's state changed to UP        BackupMethod: ROUNDROBIN
        Mode: IP
        Persistence: NONE
        Vserver IP and Port insertion: OFF 
        Push: DISABLED  Push VServer: 
        Push Multi Clients: NO
        Push Label Rule: none
        L2Conn: OFF
        Skip Persistency: None
        Listen Policy: NONE
        IcmpResponse: PASSIVE
        RHIstate: PASSIVE
        New Service Startup Request Rate: 0 PER_SECOND, Increment Interval: 0
        Mac mode Retain Vlan: DISABLED
        DBS_LB: DISABLED
        Process Local: DISABLED
        Traffic Domain: 0
        TROFS Persistence honored: ENABLED
        Retain Connections on Cluster: NO
        Order Sequence: ASCENDING
        Current Active Order: None

Bound Service Groups:
1)      Group Name: openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr

                1) openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (10.90.19.155: 8080) - HTTP State: DOWN  Weight: 1 Order: None
                2) openshift-python-test-app_8080_sgp_57fkxi7vr25aap47oq2vsb4jo2dm3yvr (10.90.19.22: 8080) - HTTP State: DOWN   Weight: 1 Order: None
 Done

On previous images with tag 1.33.x, when image name was Citrix Ingress Controller, this feature was working but it is not working now with Netscaler Ingress Controller.

arijitr-citrix commented 7 months ago

@arasyor Can you please share the Ingress Controller logs during this activity?

arasyor commented 7 months ago

nsic-netscaler-ingress-controller-3-48wnb-nsic.log.zip

Hi,

I uploaded the log file. Logs contain the activity below.

Log outputs may be different than command outputs above because I've created fresh in order to simplify the logs. But configurations are same.

arijitr-citrix commented 7 months ago

Can you also please share your service yaml for svc python-test-app?

arasyor commented 7 months ago

You can find service yaml below.

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2024-02-09T15:18:26Z"
  name: python-test-app
  namespace: opensol-dev-infra
  resourceVersion: "2061657141"
  uid: 7b0b8c02-4362-467d-bba4-c0423d37ce7f
spec:
  clusterIP: 10.90.221.159
  clusterIPs:
  - 10.90.221.159
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: python-test-app
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
arijitr-citrix commented 7 months ago

While checking the logs, I can see that this is not the right version as mentioned:

 User has accepted EULA. Starting Triton 

 The default port for Citrix ingress controller to communicate with Citrix ADC has been changed from 80 to 443 and the protocol has been changed from HTTP to HTTPS

Citrix Ingress Controller version: 1.37.6, build: Fri 17 Nov 17:16:51 UTC 2023 

Looks like you are using the operator and the version 1.39.6's operator is still not released. Can you please try install with helm charts?

arasyor commented 7 months ago

Hi,

I installed controller with helm but nothing changed, problem still exists. You can find logs as attacment.

helm-nsic-netscaler-ingress-controller-3-6h2gk-nsic.log.zip

I've executed the helm command below for installation.

helm upgrade --install nsic netscaler/netscaler-ingress-controller \
  --namespace netscaler-ingress-controller \
  --create-namespace \
  --set adcCredentialSecret=nslogin-local \
  --set clusterName=ocptstinf01 \
  --set crds.install=true \
  --set crds.retainOnDelete=false \
  --set defaultSSLCertSecret=nsic-tst-cert \
  --set entityPrefix=openshift \
  --set ingressClass[0]=netscaler \
  --set license.accept=yes \
  --set nodeSelector.key=node-role.kubernetes.io/infra \
  --set nodeSelector.value="" \
  --set nodeWatch=true \
  --set nsIP=10.81.22.10 \
  --set nsSNIPS='["10.79.94.56"]' \
  --set nsVIP=10.79.94.55 \
  --set openshift=true \
  --set optimizeEndpointBinding=true \
  --set routeLabels="netscaler-ingress-controller=true" \
  --set tolerations[0].effect=NoSchedule \
  --set tolerations[0].key=node-role.kubernetes.io/infra \
  --set tolerations[0].operator=Exists \
  --set tolerations[0].value=""
$ helm list
NAME    NAMESPACE                       REVISION        UPDATED                                 STATUS          CHART                                   APP VERSION
nsic    netscaler-ingress-controller    3               2024-02-27 20:41:37.703355013 +0300 +03 deployed        netscaler-ingress-controller-1.39.6     1.39.6 
arijitr-citrix commented 7 months ago

Can you please share the deployment yaml of the app: python-test-app? Can you also share the customer name?

arasyor commented 7 months ago

Hi,

The deployment yaml of the python-test-app below.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "19"
  creationTimestamp: "2024-02-09T15:17:56Z"
  generation: 35
  name: python-test-app
  namespace: opensol-dev-infra
  resourceVersion: "2120156661"
  uid: 5d651e46-1cef-4ce0-8a3a-d9226991580a
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: python-test-app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2024-02-27T20:48:28+03:00"
      creationTimestamp: null
      labels:
        app: python-test-app
    spec:
      containers:
      - image: repo.finansbank.com.tr/infra-docker/python-test-app:latest
        imagePullPolicy: Always
        name: python-test-app
        ports:
        - containerPort: 8080
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

The customer name is QNB Finansbank.

arijitr-citrix commented 7 months ago

Hi,

We would like to have a discussion on this issue. Please share your mail id so that we can setup a meet.

BR!

arasyor commented 7 months ago

Hi,

You can reach me from aras.yorganci@ibtech.com.tr.

Regards.

subashd commented 6 months ago

hi @arasyor We have fixed this issue in version 1.40.12 https://github.com/netscaler/netscaler-k8s-ingress-controller/releases/tag/1.40.12