openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

openshift-ansbile 3.9 pre-requisite fails with no such file for /var/lib/containers #10139

Closed bamachrn closed 4 years ago

bamachrn commented 6 years ago

Description

We are using openshift-ansible 3.9 RPM from centos paas sig for installing openshift origin cluster 3.9. It has got two nodes

both has se-selinux permissive, iptables set to allow all the communication between those nodes.

OSEv3:vars

debug_level=4 openshift_master_api_port=8443 openshift_deployment_type=origin openshift_release=v3.9 os_firewall_use_firewalld=false openshift_disable_swap=false openshift_clock_enabled=false openshift_pkg_version=-3.9.0 openshift_enable_service_catalog=false

when we run:

ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml -vvvv --become --become-method=sudo

it fails with

  The full traceback is:
  File "/tmp/ansible_RDjE1j/ansible_modlib.zip/ansible/module_utils/basic.py", line 2859, in run_command
    cmd = subprocess.Popen(args, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception

fatal: [osm.c2bm.rdu2.centos.org]: FAILED! => {
    "changed": false, 
    "cmd": "restorecon -R /var/lib/containers/", 
    "invocation": {
        "module_args": {
            "_raw_params": "restorecon -R /var/lib/containers/", 
            "_uses_shell": false, 
            "argv": null, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "msg": "[Errno 2] No such file or directory", 
    "rc": 2
}

even though playbook shows the previous task as to ensure /var/lib/containers exists and works without error. And trying to run the command manually works just fine on those nodes.

We have tried with with updating, downgrading the ansible versions nothing changes.

Now, when I update the task to ignore_erros: true prerequisites.yml works fine. But similar errors shows up while running the deploy_cluster.yml with some other file.

Version
$ ansible --version
ansible 2.6.4
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/bamachrn/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

$ rpm -q openshift-ansible
openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch
Steps To Reproduce
  1. Setting up ansible controller node

    yum install -y git && yum install -y rsync && yum install -y gcc libffi-devel python-devel openssl-devel && yum install -y epel-release && yum install -y PyYAML python-networkx python-nose python-pep8 python-jinja2 rsync centos-release-openshift-origin39.noarch && yum install -y http://cbs.centos.org/kojifiles/packages/ansible/2.5.5/1.el7/noarch/ansible-2.5.5-1.el7.noarch.rpm && yum install -y openshift-ansible

  2. Setting all the nodes for cluster setup using this playbook

  3. Running

    ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml -vvvv --become --become-method=sudo

Expected Results
Playbook should run without error.
Observed Results
Playbook is failing with "restorecon -R /var/lib/containers/" as no such file or directory
michaelgugino commented 6 years ago

Don't use --become command line switch, don't use --become-method=sudo on command line either. Set those variables in the OSEv3 group_vars in your inventory.

Most likely, you have having a $PATH problem on your remote hosts. Some people don't have /sbin in $PATH when using sudo, and that breaks all kinds of things, don't do this.

dharmit commented 6 years ago

@michaelgugino thanks for the pointer. We modified our hosts file to look like below:

[OSEv3:vars]                                                                                                                                                  
# SSH user, this user should allow ssh based auth without requiring a password                                                                                
ansible_ssh_user=dharmit                                                                                                                                      
ansible_become=yes

However, the error persists.

We then checked the $PATH:

$ echo $PATH
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/dharmit/.local/bin:/home/dharmit/bin
$ which restorecon
/usr/sbin/restorecon

As reported earlier, manually running the command worked just fine. So we changed restorecon to /usr/sbin/restorecon and that worked. But as you see from above output, /usr/sbin is already in $PATH. I checked for sudo echo $PATH just to be sure and it's there as well.

- name: Fix SELinux Permissions on /var/lib/containers                                                                                                        
  command: "/usr/sbin/restorecon -R /var/lib/containers/"                                                                                                     
  changed_when: false
openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/10139#issuecomment-664031279): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.