openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.17k stars 2.31k forks source link

Fail to upgrade Origin after updating openshift-ansible to the latest release #2167

Closed macedogm closed 8 years ago

macedogm commented 8 years ago

Hi. I have updated openshift-ansible today to the latest release. I tried to update Origin to v1.2.1 with the following command:

# ansible-playbook -i /etc/ansible/hosts openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml

But it failed with this error:

TASK: [openshift_version | fail ] ********************************************* 
failed: [master-1.paas.srv.srvr.rbs.net] => {"failed": true}
msg: Detected openshift version 1.2.0 does not match requested openshift_release 3.2. 
You may need to adjust your yum repositories or specify an exact openshift_pkg_version. 
FATAL: all hosts have already failed -- aborting

I have made this procedure many times before and never got this type of error. Does anyone knows what this error can be?

Thanks in advance.

dgoodwin commented 8 years ago

@macedogm a bug in my recent work, the upgrade playbook is overriding openshift_release to v3.2 (OpenShift Enterprise behavior), I forgot about origin upgrade here.

I'll get you a fix as soon as possible.

If you'd like to work around the issue immediately, edit /playbooks/common/openshift-cluster/upgrades/v3_1_to_v3_2/pre.yml and look for:

- include: ../../../../common/openshift-cluster/initialize_openshift_version.yml
  vars:
    # Request openshift_release 3.2 and let the openshift_version role handle converting this
    # to a more specific version, respecting openshift_image_tag and openshift_pkg_version if
    # defined, and overriding the normal behavior of protecting the installed version
    openshift_release: "3.2"

Change 3.2 to 1.2 there.

You likely will also need openshift_image_tag=v1.2.1 in your inventory if you don't already have it.

macedogm commented 8 years ago

@dgoodwin no problem with that! I did the workaround as you told, but I got a new error.

TASK: [Verify OpenShift 3.2 RPMs are available for upgrade] ******************* 
failed: [node-1.paas.srv.srvr.rbs.net] => {"failed": true}
msg: OpenShift 1.2.0 is available, but 3.2 or greater is required
failed: [master-1.paas.srv.srvr.rbs.net] => {"failed": true}
msg: OpenShift 1.2.0 is available, but 3.2 or greater is required

FATAL: all hosts have already failed -- aborting

So I changed version_compare('3.2', '<') in /playbooks/common/openshift-cluster/upgrades/v3_1_to_v3_2/pre.yml from this:

  - name: Verify OpenShift 3.2 RPMs are available for upgrade
    fail:
      msg: "OpenShift {{ avail_openshift_version.stdout }} is available, but 3.2 or greater is required"
    when: not openshift.common.is_containerized | bool and not avail_openshift_version | skipped and avail_openshift_version.stdout | default('0.0', True) | version_compare('3.2', '<')

To this version_compare('1.2', '<'):

  - name: Verify OpenShift 3.2 RPMs are available for upgrade
    fail:
      msg: "OpenShift {{ avail_openshift_version.stdout }} is available, but 3.2 or greater is required"
    when: not openshift.common.is_containerized | bool and not avail_openshift_version | skipped and avail_openshift_version.stdout | default('0.0', True) | version_compare('1.2', '<')

And the upgrade executed without errors, but I believe that it actually did not worked as expected.

# openshift version 
openshift v1.2.0

Please, correct me if I am wrong.

dgoodwin commented 8 years ago

Could you provide all the variables you have set in inventory?

macedogm commented 8 years ago

Yes, sure.

# cat /etc/ansible/hosts 
[OSEv3:children]
masters
nodes

[OSEv3:vars]
deployment_type=origin
openshift_image_tag=v1.2.1
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant
osm_cluster_network_cidr=172.16.0.0/16

[masters]
master-1.paas.srv.srvr.rbs.net openshift_hostname=master-1.paas.srv.srvr.rbs.net

[nodes]
master-1.paas.srv.srvr.rbs.net openshift_node_labels="{'region': 'infra', 'zone': 'default'}"  openshift_schedulable=true
node-1.paas.srv.srvr.rbs.net openshift_hostname=node-1.paas.srv.srvr.rbs.net openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
dgoodwin commented 8 years ago

That looks right to me, I am about to start testing, will get this addressed ASAP, and apologies for the bugs!

macedogm commented 8 years ago

OK. No problem. For the moment our Origin is running fine. I tested the upgrade on our lab (not production environment). Please, let me know if you need more info. Thanks for your help.

dgoodwin commented 8 years ago

Ok I reproduced both of the 3.2 vs 1.2 errors you worked around @macedogm, you did everything correctly however the reason you're still showing 1.2.0 is because 1.2.1 rpms are not live yet:

http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin/

@tdawson let us know if that's unexpected in any way if you have a moment.

Fix incoming for the 1.2 rpm upgrade version issues.

macedogm commented 8 years ago

@dgoodwin thanks for you help! I will wait the packages. Next time I will check the repo first before updating. My bad here.

tdawson commented 8 years ago

The 1.2.1 rpm's should have been in the repo this morning. I checked and found out that there has been filesystem issues, so they haven't been pushed out yet. No estimate of when they will be in the repo's, hopefully today.

tdawson commented 8 years ago

The origin 1.2.1 rpm's are now in the released repo and available on the mirrors. Give it another try.

macedogm commented 8 years ago

I did a new playbook update and it worked smoothly! Thanks @dgoodwin @tdawson

# openshift version
openshift v1.2.1
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5
# rpm -qa | grep origin
origin-clients-1.2.1-1.el7.x86_64
origin-master-1.2.1-1.el7.x86_64
centos-release-openshift-origin-1-1.el7.centos.noarch
tuned-profiles-origin-node-1.2.1-1.el7.x86_64
origin-1.2.1-1.el7.x86_64
origin-sdn-ovs-1.2.1-1.el7.x86_64
origin-node-1.2.1-1.el7.x86_64
dgoodwin commented 8 years ago

Good to hear thanks for the bug reports. Closing this out now.