redhat-cop / infra.leapp

Collection of Ansible roles for automating RHEL in-place upgrades using Leapp.
MIT License
43 stars 33 forks source link

fata error on /var/log/ripu directory #160

Open gcccheng opened 4 months ago

gcccheng commented 4 months ago

Hi,

We constantly receive below error message while running playbook to upgrade from rhel7 to rhel8

RUNNING HANDLER [infra.leapp.common : Add end time to log file] **** ... FAILED! => {"changed": false, "msg": "The destination directory (/var/log/ripu) is not writable by the current user. Error was: [Errno 13] Permission denied: b'/var/log/ripu/.ansible_tmpq3ycp1exripu.log'"}

I checked the role content, and found two relevant tasks

The thing is that, Ansible job does not complain on the first task of creating the log file, however it complains on the second task which adds a line and is a handler.

We run the role with become: true option, so there should not be any permission issue.

So I wonder what could cause the error here? Thanks! @swapdisk @swapdisk @scott-vick

jeffmcutter commented 4 months ago

Hi @gcccheng,

I have tested from 7-8 using root SSH login, passwordless user ssh and sudo, and with password for ssh and sudo. The latter 2 of the 3 with root ssh disabled to ensure it's not using root. I have been unable to replicate the issue.

Are you able to provide some additional details? Perhaps the following?

df -h /var/log/ripu ls -laZ /var/log/ripu

If using ansible-navigator or Ansible Controller, the json details for the task. If not, run ansible-playbook with -vvvv and provide the output for that task.

Thanks, -Jeff

gcccheng commented 4 months ago

Hi @jeffmcutter , thanks for replying, check below fresh error with -vvv and command result

RUNNING HANDLER [infra.leapp.common : Add end time to log file] **** (1, b'/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (C.UTF-8)\r\n\r\n{"msg": "The destination directory (/var/log/ripu) is not writable by the current user. Error was: [Errno 13] Permission denied: \'/var/log/ripu/.ansible_tmprQGhV2ripu.log\'", "failed": true, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ansible.builtin.lineinfile_payload_ygBaEc/ansible_ansible.builtin.lineinfile_payload.zip/ansible/module_utils/basic.py\", line 1705, in atomic_move\n tmp_dest_fd, tmp_dest_name = tempfile.mkstemp(prefix=b\'.ansible_tmp\', dir=b_dest_dir, suffix=b_suffix)\n File \"/usr/lib64/python2.7/tempfile.py\", line 304, in mkstemp\n return _mkstemp_inner(dir, prefix, suffix, flags)\n File \"/usr/lib64/python2.7/tempfile.py\", line 239, in _mkstemp_inner\n fd = _os.open(file, flags, 0600)\nOSError: [Errno 13] Permission denied: \'/var/log/ripu/.ansible_tmprQGhV2ripu.log\'\n", "invocation": {"module_args": {"group": "root", "insertbefore": null, "unsafe_writes": false, "selevel": null, "create": false, "seuser": null, "regexp": null, "state": "present", "owner": "root", "backrefs": false, "search_string": null, "serole": null, "mode": "0644", "firstmatch": false, "insertafter": null, "path": "/var/log/ripu/ripu.log", "line": "Job ended at 2024-03-15T12:43:44Z", "attributes": null, "backup": false, "validate": null, "setype": null}}}\r\n', b'Shared connection to MYHOST closed.\r\n') MYHOST> Failed to connect to the host via ssh: Shared connection to MYHOST closed. Traceback (most recent call last): File "/tmp/ansible_ansible.builtin.lineinfile_payload_ygBaEc/ansible_ansible.builtin.lineinfile_payload.zip/ansible/module_utils/basic.py", line 1705, in atomic_move tmp_dest_fd, tmp_dest_name = tempfile.mkstemp(prefix=b'.ansible_tmp', dir=b_dest_dir, suffix=b_suffix) File "/usr/lib64/python2.7/tempfile.py", line 304, in mkstemp return _mkstemp_inner(dir, prefix, suffix, flags) File "/usr/lib64/python2.7/tempfile.py", line 239, in _mkstemp_inner fd = _os.open(file, flags, 0600) OSError: [Errno 13] Permission denied: '/var/log/ripu/.ansible_tmprQGhV2ripu.log' fatal: [MYHOST]: FAILED! => { "changed": false, "invocation": { "module_args": { "attributes": null, "backrefs": false, "backup": false, "create": false, "firstmatch": false, "group": "root", "insertafter": null, "insertbefore": null, "line": "Job ended at 2024-03-15T12:43:44Z", "mode": "0644", "owner": "root", "path": "/var/log/ripu/ripu.log", "regexp": null, "search_string": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "state": "present", "unsafe_writes": false, "validate": null } }, "msg": "The destination directory (/var/log/ripu) is not writable by the current user. Error was: [Errno 13] Permission denied: '/var/log/ripu/.ansible_tmprQGhV2ripu.log'" }

$ ls -laZ /var/log/ripu drwxr-xr-x. root root unconfined_u:object_r:var_log_t:s0 . drwxr-xr-x. root root system_u:object_r:var_log_t:s0 .. -rw-r--r--. root root system_u:object_r:var_log_t:s0 ripu.log $ df -h /var/log/ripu Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootvg-var_log 8.0G 68M 8.0G 1% /var/log

jeffmcutter commented 4 months ago

Hi @gcccheng,

It's odd to me that the python is 2.7. It should get switched here:

https://github.com/redhat-cop/infra.leapp/blob/main/roles/upgrade/tasks/leapp-upgrade.yml#L47

Can you provide --version for whatever command you are using to run the playbook?

Thanks, -Jeff

gcccheng commented 4 months ago

do you mean ansible version? @jeffmcutter ansible --version ansible [core 2.16.4] config file = /etc/ansible/ansible.cfg configured module search path = ['/home/gcccheng/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/gcccheng/.local/lib/python3.11/site-packages/ansible ansible collection location = /home/gcccheng/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible python version = 3.11.5 (main, Sep 22 2023, 15:34:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] (/usr/bin/python3.11) jinja version = 3.1.3 libyaml = True

djdanielsson commented 4 months ago

The 2.7 is very likely the issue... the question is what is causing it...

jeffmcutter commented 4 months ago

Interesting, recent Ansible so that is good. As David points out though, what is causing Ansible to encounter Python 2.7. The upgrade sets ansible_python_interpreter to /usr/bin/python3:

https://github.com/redhat-cop/infra.leapp/blob/main/roles/upgrade/tasks/leapp-upgrade.yml#L46

Can you see what this returns post upgrade?

---
- name: Check python interpreter
  hosts: all
  gather_facts: true
  tasks:
    - name: Display discovered_interpreter_python
      ansible.builtin.debug:
        var: discovered_interpreter_python

In my environment it looks like this after upgrading from 7-8:

$ ap --limit rhel04* python.yml 

PLAY [Check python interpreter] ***************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************************************************************************
ok: [rhel04.localdomain.local]

TASK [Display discovered_interpreter_python] **************************************************************************************************************************************************************************************************************
ok: [rhel04.localdomain.local] => {
    "discovered_interpreter_python": "/usr/libexec/platform-python"
}

PLAY RECAP ************************************************************************************************************************************************************************************************************************************************
rhel04.localdomain.local   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

[jcutter@jcutter-fedora scratch{main}]$ ssh rhel04
Last login: Mon Mar 18 01:54:57 2024 from 192.168.122.1
[root@rhel04 ~]# /usr/libexec/platform-python --version
Python 3.6.8
gcccheng commented 4 months ago

Hi @jeffmcutter , see the output below, this is a machine which had error, but we upgraded it to rhel8 anyway, and we see the output are like yours, there are several version of python installed on the host $ python python2 python2.7 python3 python3.6 python3.6m

output from playbook ... "discovered_interpreter_python": "/usr/libexec/platform-python" .. output from command /usr/libexec/platform-python --version Python 3.6.8

jeffmcutter commented 4 months ago

Hi @gcccheng,

That's good background info.

Are you able to provide the Python versions and how they were installed (RPM from where, pip, etc.) on RHEL 7 so that I can try to replicate the issue and hopefully find a solution to it in the collection? Perhaps just the details of the version of the one in the error message may be enough.

Thanks, -Jeff

gcccheng commented 3 months ago

@jeffmcutter just want to make it clear,

Hi @jeffmcutter , see the output below, this is a machine which had error, but we upgraded it to rhel8 anyway, and we see the output are like yours, there are several version of python installed on the host $ python python2 python2.7 python3 python3.6 python3.6m

output from playbook ... "discovered_interpreter_python": "/usr/libexec/platform-python" .. output from command /usr/libexec/platform-python --version Python 3.6.8

Hi @jeffmcutter just want to make it clear, all the output in this comment came fra a rhel8 machine, which was finished with upgrading, so I did not know how/which python was installed before.

However, I can show you another rhel7 which has not been upgraded. Output from this rhel7 machine are

discovered_interpreter_python": "/usr/bin/python" /usr/bin/python --version Python 2.7.5 $ python python python2 python2.7

So only python2.7 installed. I ran the upgrading playbook against this host as well, and got same error Permission denied: '/var/log/ripu/.ansible_tmpmSHo4zripu.log

I am not sure how python was installed, is there a way to find out?

jeffmcutter commented 3 months ago

Perhaps something like this to show it's the standard RHEL one:

[root@satrhel7-2 ~]# yum provides /usr/bin/python
Loaded plugins: product-id, search-disabled-repos, subscription-manager,
              : tracer_upload

python-2.7.5-92.el7_9.x86_64 : An interpreted, interactive, object-oriented
                             : programming language
Repo        : @rhel-7-server-rpms
Matched from:
Filename    : /usr/bin/python

[root@satrhel7-2 ~]# yum list python
Loaded plugins: product-id, search-disabled-repos, subscription-manager, tracer_upload

Installed Packages
python.x86_64                              2.7.5-92.el7_9                              @rhel-7-server-rpms
[root@satrhel7-2 ~]# ls -l /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Jun  6  2023 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Jun  6  2023 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7144 May 27  2022 /usr/bin/python2.7
-rwxr-xr-x. 1 root root   304 Mar  6  2019 /usr/bin/python2.7-futurize
-rwxr-xr-x. 1 root root   308 Mar  6  2019 /usr/bin/python2.7-pasteurize
lrwxrwxrwx. 1 root root     9 Jun  6  2023 /usr/bin/python3 -> python3.6
-rwxr-xr-x. 2 root root 11336 Aug 13  2020 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11336 Aug 13  2020 /usr/bin/python3.6m
-rwxr-xr-x. 1 root root  2554 Apr  4  2018 /usr/bin/python-argcomplete-check-easy-install-script
-rwxr-xr-x. 1 root root   234 Nov 30  2016 /usr/bin/python-argcomplete-tcsh
[root@satrhel7-2 ~]# 
jeffmcutter commented 3 months ago

Can you share the ls -l /usr/bin/python* from both before and after?

gcccheng commented 3 months ago

@jeffmcutter before upgrading ~]$ which python /usr/bin/python ~]$ python --version Python 2.7.5 ~]$ yum provides /usr/bin/python python-2.7.5-16.el7.x86_64 : An interpreted, interactive, object-oriented programming language Repo : rhel-7-server-rpms Matched from: Filename : /usr/bin/python . . . python-2.7.5-92.el7_9.x86_64 : An interpreted, interactive, object-oriented programming language Repo : @rhel-7-server-rpms Matched from: Filename : /usr/bin/python

~]$ yum list python Loaded plugins: enabled_repos_upload, package_upload, product-id, search-disabled-repos, subscription-manager Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast Installed Packages python.x86_64 2.7.5-92.el7_9 @rhel-7-server-rpms ls -l /usr/bin/python* lrwxrwxrwx. 1 root root 7 Apr 28 2023 /usr/bin/python -> python2 lrwxrwxrwx. 1 root root 9 Apr 28 2023 /usr/bin/python2 -> python2.7 -rwxr-xr-x. 1 root root 7144 May 27 2022 /usr/bin/python2.7

After upgrading python3 --version Python 3.6.8 ~]$ yum provides /usr/bin/python3.6 Not root, Subscription Management repositories not updated Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) 72 MB/s | 60 MB 00:00 Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) 70 MB/s | 66 MB 00:00 EPEL8 55 MB/s | 17 MB 00:00 Last metadata expiration check: 0:00:06 ago on Wed 20 Mar 2024 01:42:41 PM CET. python36-3.6.8-1.module+el8+2710+846623d6.x86_64 : Interpreter of the Python programming language Repo : rhel-8-for-x86_64-appstream-rpms Matched from: Filename : /usr/bin/python3.6

yum list python3 Updating Subscription Management repositories. EPEL8 86 kB/s | 2.3 kB 00:00 Last metadata expiration check: 0:00:01 ago on Wed 20 Mar 2024 01:32:29 PM CET. Error: No matching Packages to list

ls -l /usr/bin/python* lrwxrwxrwx. 1 root root 9 Oct 3 14:47 /usr/bin/python2 -> python2.7 -rwxr-xr-x. 1 root root 8024 Oct 3 14:47 /usr/bin/python2.7 lrwxrwxrwx. 1 root root 25 Mar 20 13:03 /usr/bin/python3 -> /etc/alternatives/python3 lrwxrwxrwx. 1 root root 31 Dec 7 19:00 /usr/bin/python3.6 -> /usr/libexec/platform-python3.6 lrwxrwxrwx. 1 root root 32 Dec 7 19:00 /usr/bin/python3.6m -> /usr/libexec/platform-python3.6m

jeffmcutter commented 3 months ago

Can you provide output of rpm -q python python2 on the upgraded system?

https://github.com/redhat-cop/infra.leapp/blob/main/roles/upgrade/tasks/leapp-upgrade.yml#L47

Are you able to try setting ansible_python_interpreter to /usr/libexec/platform-python instead of /usr/bin/python3? Assuming you have the collection local you can just edit leapp-upgrade.yml directly.

Denney-tech commented 3 months ago

@gcccheng Just on the off-chance, are you running both the analysis and upgrade roles back-to-back in the same play? Both of these roles notify the same handlers in the common role, and if the roles are imported, then the handlers run twice after both roles. The first run of the handlers succeed, the second run doesn't (the roles can be included instead, and flush the handlers manually between them to as a workaround).

If you're not running them back to back, does this error occur on the analysis role? Before and/or after an upgrade?

gcccheng commented 3 months ago

@jeffmcutter below machine has been upgraded

$ rpm -q python python2 python3 python3.6
package python is not installed
python2-2.7.18-15.module+el8.9.0+20125+68111a8f.x86_64
package python3 is not installed
package python3.6 is not installed

jeffmcutter commented 3 months ago

I'm wondering if changing https://github.com/redhat-cop/infra.leapp/blob/main/roles/upgrade/tasks/leapp-upgrade.yml#L104 to point to /usr/libexec/platform-python might avoid this issue with odd Python versions. @gcccheng can you confirm that your upgraded system has /usr/libexec/platform-python?

djdanielsson commented 1 week ago

is this still an issue?