Operator cannot move pod, in case of using node_readiness_label

DeamonMV commented 4 years ago

Hello.

What is a problem

Operator not able to move pod when node_readiness_label was deleted from k8s node and this node was cordoned.

time="2020-01-14T14:38:32Z" level=warning msg="failed to move master pods from the node \"test-k8s-worker-2\": timeout of 0s minutes expired" pkg=controller

What is my ENV

Part of OperatorConfiguration:

apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
  name: postgresql-operator-configuration
configuration:
  etcd_host: ""
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
.....................................................................
    node_readiness_label:
      lifecycle-status: ready

Postgres Operator image registry.opensource.zalan.do/acid/postgres-operator:v1.3.0

That is how configured workers, as you can see one of them cordoned and does not have appropriate label

test-k8s-worker-1   Ready                      <none>   92d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-1,lifecycle-status=ready
test-k8s-worker-2   Ready,SchedulingDisabled   <none>   92d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-2
test-k8s-worker-3   Ready                      <none>   92d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-3,lifecycle-status=ready

Pods:


grafana-acid-postgres-0                     2/2     Running     0          176m    10.233.81.7    test-k8s-worker-3   <none>           <none>
grafana-acid-postgres-1                     2/2     Running     0          176m    10.233.64.10   test-k8s-worker-2   <none>           <none>```

### What I want to achieve

It will be good if operator was able to reschedule pod from unlabeled and cordoned , to another k8s node

FxKu commented 4 years ago

Is this timeout of 0s minutes expired the only message you see? Did you configured the 0s? So immediate timeout? :smiley:

Looking at the code makes me wonder, if the move is even tried once. Especially, when you don't see the inner error message in the logs.

And btw, use the v1.3.1 image, please.

DeamonMV commented 4 years ago

@FxKu thank you for quick response)

Is this timeout of 0s minutes expired the only message you see?

Yes, only this. As a first step - I do unlabel, wait for 5 sec and do cordon, after this i see this message and that all.

Did you configured the 0s? So immediate timeout? 😃

I'm use "default" configuration, copy-pasted from Github. Section with timeouts looks like this

  timeouts:
    pod_label_wait_timeout: 10m
    pod_deletion_wait_timeout: 10m
    ready_wait_interval: 4s
    ready_wait_timeout: 30s
    resource_check_interval: 3s
    resource_check_timeout: 10m

Updated operator to 1.3.1 - this didn't help.

FxKu commented 4 years ago

hm, strange. The timeout responsible here is master_pod_move_timeout and if you don't define it, it's 20m by default. Maybe you can set a value then. Have to check, if I can reproduce it being 0s (or maybe unset).

DeamonMV commented 4 years ago

Just in case that is a full configuration of operator, which i used: fyi: defining a master_pod_move_timeout didn't helped.

apiVersion: acid.zalan.do/v1
configuration:
  aws_or_gcp:
    aws_region: eu-central-1
  debug:
    debug_logging: true
    enable_database_access: true
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  etcd_host: ""
  kubernetes:
    cluster_domain: cluster.local
    cluster_labels:
      application: spilo
    cluster_name_label: cluster-name
    enable_pod_antiaffinity: false
    enable_pod_disruption_budget: true
    node_readiness_label:
      lifecycle-status: ready
    oauth_token_secret_name: postgresql-operator
    pdb_name_format: postgres-{cluster}-pdb
    pod_antiaffinity_topology_key: kubernetes.io/hostname
    pod_management_policy: ordered_ready
    pod_role_label: spilo-role
    pod_service_account_name: postgres
    pod_terminate_grace_period: 5m
    secret_name_template: '{username}.{cluster}.credentials.{tprkind}.{tprgroup}'
    spilo_privileged: false
    watched_namespace: '*'
  load_balancer:
    enable_master_load_balancer: false
    enable_replica_load_balancer: false
    master_dns_name_format: '{cluster}.{team}.{hostedzone}'
    replica_dns_name_format: '{cluster}-repl.{team}.{hostedzone}'
  logging_rest_api:
    api_port: 8008
    cluster_history_entries: 1000
    ring_log_lines: 100
  logical_backup:
    logical_backup_docker_image: registry.opensource.zalan.do/acid/logical-backup
    logical_backup_s3_bucket: ""
    logical_backup_schedule: 30 00 * * *
  master_pod_move_timeout: 5m
  max_instances: -1
  min_instances: -1
  postgres_pod_resources:
    default_cpu_limit: "3"
    default_cpu_request: 100m
    default_memory_limit: 2Gi
    default_memory_request: 100Mi
  repair_period: 5m
  resync_period: 30m
  scalyr:
    scalyr_cpu_limit: "1"
    scalyr_cpu_request: 100m
    scalyr_memory_limit: 1Gi
    scalyr_memory_request: 50Mi
  teams_api:
    enable_team_superuser: false
    enable_teams_api: false
    pam_role_name: zalandos
    protected_role_names:
    - admin
    team_admin_role: admin
    team_api_role_configuration:
      log_statement: all
  timeouts:
    pod_deletion_wait_timeout: 10m
    pod_label_wait_timeout: 10m
    ready_wait_interval: 4s
    ready_wait_timeout: 30s
    resource_check_interval: 3s
    resource_check_timeout: 10m
  users:
    replication_username: standby
    super_username: postgres
  workers: 4

FxKu commented 4 years ago

I think, this might be solved with #816 . Can you run a test with the latest operator image @DeamonMV ?

DeamonMV commented 4 years ago

Ok. Will do.

DeamonMV commented 4 years ago

@FxKu I got the same thing

How I tested:

All nodes got a new lable.
Changed postgres-operator image to latest
Updated postgres configuration
All become up and running
Removed lifecycle-status=ready on node, on which running postgres master pod
Waited 2 minutes.
Cordoned node.

Configuration:

Containers:
  postgres-operator:
    Container ID:   docker://d3c14796110533341c53c01cce122622822e3b40cf03eaf286fc2fcd5f0a3caa
    Image:          registry.opensource.zalan.do/acid/postgres-operator:latest
    Image ID:       docker-pullable://registry.opensource.zalan.do/acid/postgres-operator@sha256:deb4d2b716467d5e1b75d8f1724686370f50a7374e4f31f32b33364b1deef139
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 19 Feb 2020 16:29:29 +0200
    Ready:          True

# kubectl get operatorconfigurations.acid.zalan.do postgresql-operator-configuration -oyaml
apiVersion: acid.zalan.do/v1
configuration:
  aws_or_gcp:
    aws_region: eu-central-1
  debug:
    debug_logging: true
    enable_database_access: true
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  etcd_host: ""
  kubernetes:
    cluster_domain: cluster.local
    cluster_labels:
      application: spilo
    cluster_name_label: cluster-name
    enable_pod_antiaffinity: true
    enable_pod_disruption_budget: true
    node_readiness_label:
      lifecycle-status: ready
    oauth_token_secret_name: postgresql-operator
    pdb_name_format: postgres-{cluster}-pdb
    pod_antiaffinity_topology_key: kubernetes.io/hostname

test-k8s-worker-1   Ready,SchedulingDisabled   <none>   128d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-1
test-k8s-worker-2   Ready                      <none>   128d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-2,lifecycle-status=ready
test-k8s-worker-3   Ready                      <none>   128d   v1.13.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ceph-role=worker,kubernetes.io/hostname=test-k8s-worker-3,lifecycle-status=ready

time="2020-02-19T14:31:04Z" level=info msg="cluster has been synced" cluster-name=default/grafana-acid-postgres pkg=controller worker=0
time="2020-02-19T14:31:04Z" level=debug msg="cluster already exists" cluster-name=default/grafana-acid-postgres pkg=controller worker=0

time="2020-02-19T14:36:36Z" level=warning msg="failed to move master pods from the node \"test-k8s-worker-1\": timeout of 0s minutes expired" pkg=controller

DeamonMV commented 4 years ago

Hello.

Any updates? I wondering because I need to update kubernetes cluster, and I use kubespray, and without this feature, it's little bit harder :)

FxKu commented 4 years ago

I've added an e2e test for adding the node_readiness_label to test failover. The PR also fixes the logging behavior to show why the move does not work and the timeout exceeds. One problem we found is, that the pod move is only triggered once on a node event. The operator does not retry to move the pod unless you restart it (because ADD node events on pod restart would trigger it again).

I still wonder though why the configured timeout doesn't show up in your logs. I would expect to see something like: timeout of 20m0s minutes expired I wonder, if there is another marshalling issue, but I thought it was fixed with #816.

zalando / postgres-operator

Operator cannot move pod, in case of using node_readiness_label #792

What is a problem

What is my ENV