openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.19k stars 2.32k forks source link

Cant upgrade Origin 3.7 - 3.9 using playbook #8363

Closed a-zcomp closed 4 years ago

a-zcomp commented 6 years ago

Description

When trying to perform an upgrade from Origin 3.7 to 3.9 using playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml an error saying I need to upgrade from 3.8 is shown

Version

Please put the following version information in the code block indicated below.

ansible 2.5.2
openshift-ansible-3.9.29-1-16-g8071e9d55
Steps To Reproduce
  1. Run 3.9 upgrade playbook against 3.7 install
Expected Results

Openshift Origin is upgraded to 3.9

Observed Results

Playbook error

     Hosts:    master-2, master-3 (full host names removed)
     Play:     Verify upgrade targets
     Task:     Fail when openshift version does not meet minimum requirement for Origin upgrade
     Message:  This upgrade playbook must be run against OpenShift 3.8 or later
vrutkovs commented 6 years ago

Upgrade from 3.7 to 3.9 requires both 3.8 and 3.9 repos to be enabled. Make sure you have https://cbs.centos.org/repos/paas7-openshift-origin38-release/x86_64/os/Packages/ repo enabled

DanyC97 commented 6 years ago

@a-zcomp can we close this issue now that @vrutkovs provided the correct answer ?

ubwa commented 6 years ago

Tried the above, added the above repo to the servers however it still fails for the same reason.

BTW - is the above in the documentation? The documentation for origin also seems to have a line that says:

"Ensure the openshift_deployment_type parameter in your inventory file is set to openshift-enterprise."

Is this accurate?

DanyC97 commented 6 years ago

@ubwa regarding the documentation for origin, best place to bring the issue is on docs repo itself. In this particular case deployment_type takes 2 values as mentioned here

if you got the same erro as @a-zcomp can you please share your inventory file ?

ianmiell commented 6 years ago

I get the same problem. I definitely have both repos enabled, and 3.7 disabled.

I also believe the docs are incorrect regarding openshift-enterprise.

ianmiell commented 6 years ago
[root@master2 ~]# yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: anorien.csc.warwick.ac.uk
 * epel: ftp.nluug.nl
 * extras: mirror.as29550.net
 * updates: centos.mirroring.pulsant.co.uk
repo id                                                                                 repo name                                                                                                    status
base/7/x86_64                                                                           CentOS-7 - Base                                                                                               9,911
centos-openshift-origin                                                                 CentOS OpenShift Origin                                                                                         185
centos-openshift-origin38                                                               CentOS OpenShift Origin                                                                                          37
centos-openshift-origin39                                                               CentOS OpenShift Origin                                                                                          36
epel/x86_64                                                                             Extra Packages for Enterprise Linux 7 - x86_64                                                               12,561
extras/7/x86_64                                                                         CentOS-7 - Extras                                                                                               291
updates/7/x86_64                                                                        CentOS-7 - Updates                                                                                              626
repolist: 23,647

error

Failure summary:

  1. Hosts:    master2.vagrant.test, master3.vagrant.test
     Play:     Verify upgrade targets
     Task:     Fail when openshift version does not meet minium requirement for Origin upgrade
     Message:  This upgrade playbook must be run against OpenShift 3.8 or later

  2. Hosts:    localhost
     Play:     Gate on etcd backup
     Task:     fail
     Message:  Upgrade cannot continue. The following hosts did not complete etcd backup: master2.vagrant.test,master3.vagrant.test
ianmiell commented 6 years ago

After some debugging it seems that the run is failing because master1 is on 3.8 due to a previously failed upgrade, so the scripts think that all the servers are on that version.

I assume that the recipes can't handle a failed upgrade, therefore. I'm not sure what I'm supposed to do to resolve this. Either hack the ansible code directly, or downgrade by hand?

DanyC97 commented 6 years ago

@ianmiell i think your problem is same to https://github.com/openshift/openshift-ansible/issues/8467

ianmiell commented 6 years ago

FWIW I hacked the ansible code directly by forcing the 3.8 upgrade everywhere:

/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/v3_9/upgrade_control_plane.yml

specifically, changing < 3.8 in the test to < 3.9

vrutkovs commented 6 years ago

After some debugging it seems that the run is failing because master1 is on 3.8 due to a previously failed upgrade, so the scripts think that all the servers are on that version.

You might need to clear the facts locally and on the hosts to get the actual versions. The code from release-3.9 should be able to upgrade from 3.8 to 3.9, but there is not much we can help with, since no full ansible-playbook -vvv output and inventory were provided

ianmiell commented 6 years ago

How do I clear the facts?

I have a fully reproducible environment on vagrant in code - if you request it, I can provide it.

On Fri, May 25, 2018 at 2:23 PM, Vadim Rutkovsky notifications@github.com wrote:

After some debugging it seems that the run is failing because master1 is on 3.8 due to a previously failed upgrade, so the scripts think that all the servers are on that version.

You might need to clear the facts locally and on the hosts to get the actual versions. The code from release-3.9 should be able to upgrade from 3.8 to 3.9, but there is not much we can help with, since no full ansible-playbook -vvv output and inventory were provided

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/openshift-ansible/issues/8363#issuecomment-392056314, or mute the thread https://github.com/notifications/unsubscribe-auth/AGrczfQQufWqIRHpZ12Elmg3EXyKvHUHks5t2AXigaJpZM4T9nHD .

vrutkovs commented 6 years ago

Remove $HOME/ansible/facts on the control host and /etc/ansible/facts.d on the VMs

mshutt commented 6 years ago

@vrutkovs I believe the problem with "clearing facts" (at least with the containerized install) is that the facts are programmatically generated using roles/openshift_facts/library/openshift_facts.py

https://github.com/openshift/openshift-ansible/blob/f8f497e2bcb088553447c36974779a7c43483384/roles/openshift_facts/library/openshift_facts.py#L895

Basically, it will detect the version of the containerized install by reading /etc/sysconfig/origin-master-controllers and parsing out current version from the IMAGE_VERSION variable. In my case, I had to edit that file by hand on the master that failed the upgrade from 3.7.2 to 3.8 and manually define the tag to v3.8.0 and then I was able to re-run the 3.9 upgrade playbook. Of course, the relabeling of the master to include the new labels required by 3.9 (node-role.kubernetes.io/master) weren't applied to that one failed master either and I had to manually oc label that node . See https://github.com/openshift/openshift-ansible/issues/8467 for details.

This is, of course, in the release-3.9 tag... It seems that this has been changed in master? I've not had a chance to take a deeper look at what has been changed. But this "failsafe" parsing of the /etc/sysconfig/origin-master-controllers file to get the version had also bit me in a partially failed 3.7.0 -> 3.7.2 upgrade.

vrutkovs commented 6 years ago

it will detect the version of the containerized install by reading /etc/sysconfig/origin-master-controllers and parsing out current version from the IMAGE_VERSION variable

Yes, version detection is tricky here. Does it get updated earlier than actual containers land?

It seems that this has been changed in master?

Master now uses static pod deployment, its entirely different now

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/8363#issuecomment-663006742): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.