kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
311 stars 43 forks source link

Unable to upgrade cluster on CentOS7 hosts to Pharos 2.4.* versions #1516

Closed edita-timo closed 4 years ago

edita-timo commented 4 years ago

What happened:

Upgrading our pharos cluster gets stuck when using versions 2.4.*+oss. Debug info seems to indicate some OpenSSL issue:

I, [2019-12-11T11:59:55.459891 #30421]  INFO -- K8s::Transport: Using config with server=https://localhost:65534
    [nmprimakube01] Populating client cache
    [nmprimakube01] Error: SSL_connect SYSCALL returned=5 errno=0 state=SSLv3/TLS write client hello (OpenSSL::SSL::SSLError)
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/ssl_socket.rb:125:in `connect_nonblock'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/ssl_socket.rb:125:in `initialize'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/connection.rb:455:in `new'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/connection.rb:455:in `socket'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/connection.rb:116:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/mock.rb:57:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/instrumentor.rb:34:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/idempotent.rb:19:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/base.rb:22:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/base.rb:22:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/middlewares/redirect_follower.rb:15:in `request_call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/excon-0.66.0/lib/excon/connection.rb:270:in `request'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/transport.rb:284:in `request'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/transport.rb:363:in `get'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/client.rb:134:in `block in api_groups!'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/client.rb:133:in `api_groups!'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/client.rb:148:in `api_groups'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/k8s-client-0.10.4/lib/k8s/client.rb:156:in `apis'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phases/configure_client.rb:44:in `client_prefetch'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phases/configure_client.rb:20:in `call'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:98:in `block in apply'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/retry.rb:20:in `perform'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:71:in `block in run_serial'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:70:in `map'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:70:in `run_serial'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:82:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/phase_manager.rb:95:in `apply'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/cluster_manager.rb:157:in `apply_phase'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/cluster_manager.rb:61:in `gather_facts'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/cluster_manager.rb:69:in `validate'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/command_options/load_config.rb:27:in `block in cluster_manager'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/command_options/load_config.rb:23:in `tap'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/command_options/load_config.rb:23:in `cluster_manager'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/up_command.rb:38:in `configure'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/up_command.rb:29:in `block in execute'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/up_command.rb:28:in `chdir'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/up_command.rb:28:in `execute'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/command.rb:25:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/lib/pharos/command.rb:25:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/gems/pharos-cluster-2.4.10/bin/pharos:12:in `<top (required)>'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/bin/pharos:23:in `load'
    [nmprimakube01]     /__enclose_io_memfs__/lib/ruby/gems/2.5.0/bin/pharos:23:in `<main>'

Tested with Debian 10 client and with Mac OSX and both gave same error so I'm guessing the issue on server side.

What you expected to happen: Expected successful cluster upgrade.

How to reproduce it (as minimally and precisely as possible): Install Centos7 hosts, install kube cluster with pharos to 2.3.9+oss. Try to upgrade from there to newer release.

Anything else we need to know?:

Environment:

- Pharos version (use `pharos --version`): 
Kontena Pharos:
  - pharos version 2.4.10+oss
Common:
  - calico-cni 3.6.2 (Apache License 2.0)
  - calico-kube-controllers 3.6.2 (Apache License 2.0)
  - calico-node 3.6.2 (Apache License 2.0)
  - coredns 1.3.1 (Apache License 2.0)
  - dns-node-cache 1.15.1 (Apache License 2.0)
  - etcd 3.3.10 (Apache License 2.0)
  - kubelet-rubber-stamp 0.1.0 (Apache License 2.0)
  - kubernetes 1.14.9 (Apache License 2.0)
  - kubernetes-cni 0.7.5 (Apache License 2.0)
  - metrics-server 0.3.2 (Apache License 2.0)
  - pharos-kubelet-proxy 0.3.7 (Apache License 2.0)
  - weave-flying-shuttle 0.3.1 (Apache License 2.0)
  - weave-net 2.5.2 (Apache License 2.0)
Debian 9:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker-ce 18.06.2 (Apache License 2.0)
Centos 7:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 1.13.1 (Apache License 2.0)
Rhel 7.4:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 1.13.1 (Apache License 2.0)
Rhel 7.5:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 1.13.1 (Apache License 2.0)
Rhel 7.6:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 1.13.1 (Apache License 2.0)
Rhel 7.7:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 1.13.1 (Apache License 2.0)
Ubuntu 18.04:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 18.09 (Apache License 2.0)
Ubuntu 16.04:
  - cfssl 1.2 (MIT)
  - cri-o 1.14.11 (Apache License 2.0)
  - docker 18.09 (Apache License 2.0)
Add-ons:
  - cert-manager 0.8.1 (Apache License 2.0)
  - helm 2.13.1 (Apache License 2.0)
  - host-upgrades 0.3.1 (Apache License 2.0)
  - ingress-nginx 0.25.1 (Apache License 2.0)

Hosts:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="Red Hat"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

cluster.yml:

hosts:
  - address: "nmprimakube01.example.com"
    user: foo
    role: master
    container_runtime: cri-o
    private_address: xxx.xxx.xxx.xxx

  - address: "nmprimakube02.example.com"
    user: foo
    role: worker
    private_address: xxx.xxx.xxx.xxx
    container_runtime: cri-o

  - address: "nmprimakube03.example.com"
    user: foo
    role: worker
    private_address: xxx.xxx.xxx.xxx
    container_runtime: cri-o

network:
  provider: weave
  dns_replicas: 2
  service_cidr: 172.100.0.0/16
  pod_network_cidr: 172.200.0.0/16
  weave:
    trusted_subnets:
      - 10.xxx.xxx.0/25
      - 10.xxx.xxx.128/25

addons:
  cert-manager:
    enabled: true
    issuer:
      name: letsencrypt-staging
      server: https://acme-staging-v02.api.letsencrypt.org/directory
      email: admin@example.com
edita-timo commented 4 years ago

Got this figured out, found following errors on kube master journald: "refused local port forward: originator 127.0.0.1 port 65531, target localhost port 6443"

After enabling tcp port forwarding on ssh server the pharos got past this issue.