openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

Upgrade to 3.7.0 Failed at Task 'Upgrade all storage' with Error 'x509: certificate signed by unknown authority' and 'service-catalog-signer' #6475

Closed eliu closed 4 years ago

eliu commented 6 years ago

Description

Trying to upgrade my existing openshift origin cluster (3 nodes) to v3.7.0 using openshift-ansible's release-3.7 branch and get the following error:

TASK [Upgrade all storage] ***********************************************************************************************************************************
fatal: [master.example.com]: FAILED! => {"changed": true, "cmd": ["oc", "adm", "--config=/etc/origin/master/admin.kubeconfig", "migrate", "storage", "--include=*", "--confirm"], "delta": "0:00:01.647253", "end": "2017-12-14 11:56:38.787209", "failed": true, "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2017-12-14 11:56:37.139956", "stderr": "error: could not calculate the list of available resources: an error on the server (\"Error: 'x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"service-catalog-signer\\\")'\\nTrying to reach: 'https://172.30.147.2:443/apis/servicecatalog.k8s.io/v1beta1'\") has prevented the request from succeeding", "stderr_lines": ["error: could not calculate the list of available resources: an error on the server (\"Error: 'x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"service-catalog-signer\\\")'\\nTrying to reach: 'https://172.30.147.2:443/apis/servicecatalog.k8s.io/v1beta1'\") has prevented the request from succeeding"], "stdout": "", "stdout_lines": []}
    to retry, use: --limit @/home/vagrant/openshift-ansible/openshift-ansible-3.7.15-1-8/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade.retry

PLAY RECAP ***************************************************************************************************************************************************
etcd.example.com           : ok=104  changed=8    unreachable=0    failed=0
lb.example.com             : ok=36   changed=2    unreachable=0    failed=0
localhost                  : ok=20   changed=0    unreachable=0    failed=0
master.example.com         : ok=118  changed=8    unreachable=0    failed=1
node01.example.com         : ok=97   changed=8    unreachable=0    failed=0
node02.example.com         : ok=97   changed=8    unreachable=0    failed=0

Failure summary:

  1. Hosts:    master.example.com
     Play:     Pre master upgrade - Upgrade all storage
     Task:     Upgrade all storage
     Message:  non-zero return code
[vagrant@master openshift-ansible]$
Version

Please put the following version information in the code block indicated below.

ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/vagrant/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

If you're operating from a git clone:

openshift-ansible-3.7.15-1-8-gd47adc4

If you're running from playbooks installed via RPM or atomic-openshift-utils

Place the output between the code block below:

VERSION INFORMATION HERE PLEASE
Steps To Reproduce
  1. update the following version stamp to 3.7.0 in ansible inventory file ( /etc/ansible/hosts)
openshift_release=v3.7
openshift_image_tag=v3.7.0
  1. run ansible-playbook /path/to/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade.yml
Expected Results

Successfully done with upgrade process without errors.

Observed Results

See descriptions above please.

Additional Information

Host environments are built with Vagrant and VirtualBox.

navidaro commented 6 years ago

some solution?

alon-z commented 6 years ago

Having this problem also.

ebayer commented 6 years ago

I am having the exact problem upgrading from 3.9 to 3.10:

TASK [Upgrade all storage] *********************************************************************************************************************************************************************
Tuesday 11 September 2018  16:32:57 +0300 (0:00:00.182)       0:04:05.415 ***** 
fatal: [zift.ozguryazilim.com.tr]: FAILED! => {"changed": true, "cmd": ["oc", "adm", "--config=/etc/origin/master/admin.kubeconfig", "migrate", "storage", "--include=*", "--confirm"], "delta": "0:00:00.148438", "end": "2018-09-11 16:32:57.823826", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2018-09-11 16:32:57.675388", "stderr": "error: could not calculate the list of available resources: an error on the server (\"Error: 'x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"service-catalog-signer\\\")'\\nTrying to reach: 'https://172.30.136.60:443/apis/servicecatalog.k8s.io/v1beta1'\") has prevented the request from succeeding", "stderr_lines": ["error: could not calculate the list of available resources: an error on the server (\"Error: 'x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"service-catalog-signer\\\")'\\nTrying to reach: 'https://172.30.136.60:443/apis/servicecatalog.k8s.io/v1beta1'\") has prevented the request from succeeding"], "stdout": "", "stdout_lines": []}

PLAY RECAP *************************************************************************************************************************************************************************************
localhost                  : ok=18   changed=0    unreachable=0    failed=0   
zift.ozguryazilim.com.tr   : ok=150  changed=16   unreachable=0    failed=1   

Tuesday 11 September 2018  16:32:57 +0300 (0:00:00.320)       0:04:05.736 ***** 
=============================================================================== 
openshift_excluder : Install docker excluder - yum ------------------------------------------------------------------------------------------------------------------------------------- 64.86s
openshift_excluder : Get available excluder version ------------------------------------------------------------------------------------------------------------------------------------ 63.10s
Run health checks (upgrade) ------------------------------------------------------------------------------------------------------------------------------------------------------------ 15.40s
openshift_excluder : Get available excluder version ------------------------------------------------------------------------------------------------------------------------------------- 8.79s
Check latest available OpenShift RPM version -------------------------------------------------------------------------------------------------------------------------------------------- 2.86s
openshift_excluder : Install openshift excluder - yum ----------------------------------------------------------------------------------------------------------------------------------- 2.41s
openshift_version : Get available RPM version ------------------------------------------------------------------------------------------------------------------------------------------- 2.20s
Ensure openshift-ansible installer package deps are installed --------------------------------------------------------------------------------------------------------------------------- 1.83s
Gathering Facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.78s
Gather Cluster facts -------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.39s
openshift_facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.07s
Initialize openshift.node.sdn_mtu ------------------------------------------------------------------------------------------------------------------------------------------------------- 1.07s
openshift_facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.02s
Gathering Facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.88s
openshift_manage_node : Label nodes ----------------------------------------------------------------------------------------------------------------------------------------------------- 0.82s
etcd : Record RPM based etcd version ---------------------------------------------------------------------------------------------------------------------------------------------------- 0.81s
openshift_manage_node : Set node schedulability ----------------------------------------------------------------------------------------------------------------------------------------- 0.76s
openshift_repos : Configure correct origin release repository --------------------------------------------------------------------------------------------------------------------------- 0.61s
openshift_etcd_facts : Check for CA indicator files ------------------------------------------------------------------------------------------------------------------------------------- 0.57s
openshift_repos : refresh cache --------------------------------------------------------------------------------------------------------------------------------------------------------- 0.55s

Failure summary:

  1. Hosts:    zift.ozguryazilim.com.tr
     Play:     Pre master upgrade - Upgrade all storage
     Task:     Upgrade all storage
     Message:  non-zero return code

When I run the migrate command manually, I also get certificate error:

sudo oc adm migrate storage --include=* --loglevel=2 --confirm --config /etc/origin/master/admin.kubeconfig
F0911 16:36:10.584760   13923 helpers.go:119] error: could not calculate the list of available resources: an error on the server ("Error: 'x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"service-catalog-signer\")'\nTrying to reach: 'https://172.30.136.60:443/apis/servicecatalog.k8s.io/v1beta1'") has prevented the request from succeeding
ebayer commented 6 years ago

@eliu did you find a solution to this? How did you manage to upgrade?

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/6475#issuecomment-663989196): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.