codylane commented 7 years ago

Hello,

I've been tracking down some failing tests and I've noticed that testinfra 1.6.5 is picking up the Upstart service provider for my test case for CentOS 6, which feels wrong and it should be choosing the SysvService instead.

Here's an example:

    @pytest.mark.docker_images('centos:6')
    def test_httpd_service_starts_up_through_etc_initd(host, wait_for_svc):
        svc = host.run('/etc/init.d/httpd start')
        wait_for_svc(host, 'start', timeout=5)

        assert svc.rc == 0
>       assert host.service('httpd').is_running
E       AssertionError: assert False
E        +  where False = <service httpd>.is_running
E        +    where <service httpd> = <class 'testinfra.modules.base.UpstartService'>('httpd')
E        +      where <class 'testinfra.modules.base.UpstartService'> = <testinfra.host.Host object at 0x7fd013d85310>.service

philpep commented 7 years ago

This is probably because you have an initctl command available in your $PATH. Is this command related to upstart ? Maybe we should enhance upstart detection, for instance the presence of /etc/init/ directory and or present of the status command.

codylane commented 7 years ago

Hey @philpep - Thanks for the reply. :) Opps, yes, I forgot to explain it is happening because 'initctl' for some reason is in my path and probably pulled in via the git package on my system under test. I monkey patched service for now in my tests/conftest.py which seems to work but it probably needs some more thought you like you suggested.

Here's how I patched it:

# tests/patches.py

import testinfra.modules
from testinfra.modules.service import *

class PatchedService(Service):

    @classmethod
    def get_module_class(cls, host):
        if host.system_info.type == "linux":
            if (
                host.exists("systemctl")
                and "systemd" in host.file("/sbin/init").linked_to
            ):
                return SystemdService

            if host.exists("initctl") and host.system_info.distribution.lower() != 'centos':
                return UpstartService

            return SysvService

        if host.system_info.type == "freebsd":
            return FreeBSDService

        if host.system_info.type == "openbsd":
            return OpenBSDService

        if host.system_info.type == "netbsd":
            return NetBSDService

        raise NotImplementedError

testinfra.modules.get_module_class('service').get_module_class = PatchedService.get_module_class

Then in my tests/conftest.py I have the following:

# coding: utf-8                                                                                                                                                                                                 

import hashlib                                                                                                                                                                                                  
import itertools                                                                                                                                                                                                
import os                                                                                                                                                                                                       
import pytest                                                                                                                                                                                                   
import sys                                                                                                                                                                                                      
import testinfra                                                                                                                                                                                                
import threading                                                                                                                                                                                                
import time                                                                                                                                                                                                     

# When patching testinfra just make sure this is after you import testsinfra                                                                                                                                    
from tests.patches import *                                                                                                                                                                                     
from six.moves import urllib

Then I have a test to make sure the patch works

#

cat tests/test_patches.py 
import pytest

from tests.conftest import SUT_CENTOS6
from tests.conftest import SUT_CENTOS7

@pytest.mark.docker_images(SUT_CENTOS6)
def test_we_patched_the_service_module_to_SysvService_on_centos6(host):
    assert host.service('foo').__class__.__name__ == 'SysvService'

philpep commented 7 years ago

Seems this issue still occur on centos 6 + git which pull upstart (so we have a mixed init system here...). I reopen the issue.

codylane commented 7 years ago

Thanks for re-opening. 👍 I totally plan on doing some more research today to see what we can do to fix this.

For now this seems to work pretty well (which is a snippet from code above). I plan on testing through 'pytest' to see if this change would break any of our exists tests and report back.

            if host.exists("initctl") and host.system_info.distribution.lower() != 'centos':
                return UpstartService

I attempted to see what specinfra does and (https://github.com/mizzy/specinfra) doesn't try to test for initctl for any RedHat derivative. I know Upstart was tried in RHEL 6.x but I personally never tried to use it because it never seemed to become official. I could be wrong however but I personally never had much luck with getting Upstart to actually work.

codylane commented 7 years ago

Confirmed the tests pass with this addition. Would you like me to open a pull request with the above modification or should we continue to investigate more?

elliotweiser commented 6 years ago

Would it be plausible to leverage either Ansible's Service module code or its "fact gathering" engine to abstract/simplify the Testinfra's Service detection?

https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/service_facts.py https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/service.py

Feel free to shoot down this idea if I'm missing something. It seems like that could help both fix the bug and prevent reinventing the wheel. Thoughts?

MqllR commented 6 years ago

Hello,

I fall in a similar issue on RedHat 6 which run Upstart init service but each services is running with SystemV scripts. So testinfra detect upstart and try to run a failed status command. Thanks to @codylane for his example... then I share my trick for Red Hat 6 / 7 detection based on release number:

from testinfra.modules.service import Service, SysvService, SystemdService

class CustomService(Service):
    @classmethod
    def get_module_class(cls, host):
        out = host.check_output("cat /etc/redhat-release")
        if "release 7" in out:
            return SystemdService
        elif "release 6" in out:
            return SysvService

testinfra.modules.get_module_class('service').get_module_class = CustomService.get_module_class

philpep commented 6 years ago

Hi,

For users experimenting this issue on a given service $name, please attach the output of:

status $name; echo $? and service $name status; echo $?

Actually when the detected service backend is Upstart, it should fallback to SysV (using service command) when the service is not managed through upstart.

codylane commented 6 years ago

I'm happy to hear that my example that my spike example was able to help someone else.

It is for this reason that I abandoned testinfra months ago because I began to see these things start to bite me and I will not stand for having flakey tests in production. I've even attempted to rectify these with pull requests only to find they get denied, there's no guidance or traction and I keep getting asked to provide the same information over and over. After release 1.6.5 was when I started to really notice these problems and I tracked it down to missing test coverage for supported operating systems listed on the main page. I am very familiar with pytest and how testinfra tests and I see no signs of tests for any other operating system and can only assume the true official supported operating systems are here. CentOS 6 has no tests which is why we are seeing this problem because I can only assume it was never actually unit or integration tested. There are also a lot of woes and untested code paths for the logic that controls what service selection to use see this. I also attempted to fix this about 6 months ago Here and here and it got denied or suspended because I introduced to many changes at once. Sigh.

I've always had a lot of respect for this library and it has a lot of potential, I'm just tired of spending more time than it's worth troubleshooting broken tests in production due to problems with my testing library. I've personally switched back to using serverspec and I maintain my own patched version of testinfra on my personal github page.

codylane commented 6 years ago

As requested here's the output of the commands you requested.

docker run --rm -it --name centos6 centos:6 /bin/bash

yum install -y httpd

[root@7c98e782209e /]# service httpd status; echo $?
httpd is stopped
3

[root@7c98e782209e /]# status httpd; echo $?
status: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused
1

Again, I'll say from personal experience that I have never actually seen Upstart work on a RHEL derivative. Throughout my entire career it's always been SysV for all of RHEL based distros >= 4 and <= 6 until systemd was introduced in RHEL >= 7.

My patch above works and it also has tests. I didn't put in a pull request because I'm not going to waste any more of time on this.

philpep commented 6 years ago

Hi @codylane, sorry for your bad experience using and contributing to testinfra. While I totally understand your frustration I cannot merge pull requests that I (or someone else) cannot maintain later on. Also the reason why there is a bug with centos 6 services is not clear to me. I seems to me that rhel/centos 6 introduced upstart (while keeping sysv in parallel) just before switching to systemd in version 7.

Thank you for the output of the commands, I see no bugs here since UpstartService fallback to SysV as expected.

$ cat test.py
def test_running(host):
    assert host.service('httpd').is_running

def test_not_running(host):
    assert not host.service('httpd').is_running

$ docker run -it --rm centos:6
[root@9b331b506f19 /]# yum -y install httpd

$ python -c 'import testinfra; print(testinfra.get_host("docker://9b331b506f19").service)'
<class 'testinfra.modules.base.UpstartService'>

$ py.test --tb=no -v --host=docker://9b331b506f19 test.py
t.py::test_running[docker://9b331b506f19] FAILED
t.py::test_not_running[docker://9b331b506f19] PASSED

$ docker exec -it 9b331b506f19 /etc/init.d/httpd start
$ py.test --tb=no -v --host=docker://9b331b506f19 test.py
t.py::test_running[docker://9b331b506f19] PASSED
t.py::test_not_running[docker://9b331b506f19] FAILED

@MqllR with what service you're experimenting this issue ? Any hint to reproduce it within docker ?

codylane commented 6 years ago

hi @philpep - No worries man. I know you are busy and I'm sure it's hard to maintain such an awesome library. I can totally appreciate where you are coming from as well. I just want to help and when I find problems, I like to fix them rather than complain. I hope I don't seem to pushy. I really like testinfra and I've enjoyed using on the +4k machines I've touched with this library in the past few years.

I also see what you mean about having upstart failback to sysv. I see that now with that verbose output from the command you provided.

$ cat test.py 
# flake8: noqa
def test_running(host):
    assert host.service('httpd').is_running

def test_not_running(host):
    assert not host.service('httpd').is_running

$ pip freeze -l
attrs==17.4.0
flake8==2.5.5
funcsigs==1.0.2
hacking==1.0.0
mccabe==0.2.1
pbr==3.1.1
pep8==1.5.7
pluggy==0.6.0
py==1.5.2
pyflakes==0.8.1
pytest==3.4.1
six==1.11.0
testinfra==1.11.1

For what it's worth - I also tried this on testinfra 1.6.5 and shows the same as below. Perhaps I was just focused on the fact that host.service.__name__ == 'UpstartService' and should have focused on the behavior and increased verbosity as you demonstrated.

$ pytest -vs --tb=no -v --host=docker://centos6 test.py
======================================================= test session starts ========================================================
platform darwin -- Python 2.7.14, pytest-3.4.1, py-1.5.2, pluggy-0.6.0 -- /Users/cody.lane/.virtualenvs/testinfra/bin/python2.7
cachedir: .pytest_cache
rootdir: /private/tmp, inifile:
plugins: testinfra-1.11.1
collected 2 items                                                                                                                  

test.py::test_running[docker://centos6] INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'uname -s'", exit_status=0, stdout='Linux\n', stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'lsb_release -a'", exit_status=127, stdout=None, stderr='/bin/sh: lsb_release: command not found\n')
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'cat /etc/os-release'", exit_status=1, stdout=None, stderr='cat: /etc/os-release: No such file or directory\n')
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'cat /etc/redhat-release'", exit_status=0, stdout='CentOS release 6.9 (Final)\n', stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'command -v systemctl'", exit_status=1, stdout=None, stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'command -v initctl'", exit_status=0, stdout='/sbin/initctl\n', stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'command -v status'", exit_status=0, stdout='/sbin/status\n', stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'test -d /etc/init'", exit_status=0, stdout=None, stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'status httpd'", exit_status=1, stdout=None, stderr='status: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused\n')
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'command -v service'", exit_status=0, stdout='/sbin/service\n', stderr=None)
INFO:testinfra:RUN CommandResult(command="docker exec centos6 /bin/sh -c 'service httpd status'", exit_status=3, stdout='httpd is stopped\n', stderr=None)
FAILED

MqllR commented 6 years ago

Hi @codylane,

I work in a total isolated environment with internal repo and hardened image... so no way (just take lot of time!) to test with a docker image.

I'll test your explanation on upstart falling back to sysV and i'll back to you with more test, information...

MqllR commented 6 years ago

Hi @philpep,

I found the error and it was my mistake... I check my services in sudo context but run assert test outside. Unfortunately without the sudo context, status command is not available because of my PATH variable. When the status command is not available, upstart will not fallback to sysV script and exit with exception.

Thanks to you, I got a better comprehension of testinfra that I make the promotion :)

philpep commented 6 years ago

@MqllR ok thanks for the explanation.

So I think there's still a bug, I guess that for services running upstart, being root is not a requirement for running /sbin/status command. I think testinfra should look at status in /sbin like it does for service.

philpep commented 6 years ago

EDIT: not sure: AssertionError: Unexpected exit code 4 for CommandResult(command=b'service httpd status', exit_status=4, stdout=b'httpd status unknown due to insufficient privileges.\n', stderr=None)

MqllR commented 6 years ago

@philpep,

Indeed, being root is not a requirement but running sysV script may need sufficient privilege to read / write information. Like my use case, redis init script need to read the PID file owned by redis user/group. I supposed that httpd run into the same problem.

codylane commented 6 years ago

It's not a hard requirement and is left to the package maintainer but most package maintainers do adhere to security requirements. Not all service bootstrapping has to happen from init but those that do should 98% of the time always be started from as the root user to avoid problems. If the service shouldn't be run as root, the service should su the affective user and spawn the service with that user.

My point above only corresponds to this ticket which we are talking about a RHEL based distro that uses SysV startup scripts or Systemd. There are other init style based services with their own execution models but for the sake of this ticket and this potential bug, I feel that we should at the very least agree that anything service related through init or systemd should be invoked as root and make no other assumptions. Just my 2$. Feel free to disagree or provide different context.

I'm happy to share any other ideas or things to try.

A different context of thought:

If I was going to do a refactor of this of any kind, I would be looking to see what Ansible and Salt do as @elliotweiser suggests. I also wonder if it would be possible without to much work to just piggy back off the core library of Ansible or Salt and take advantage of fact gathering, service management.. etc. I have a feeling this would be a significant re-write/refactor however. I may even explore this on my own fork.

elliotweiser commented 6 years ago

@codylane Thanks for the nod. These were the underlying motivations for my earlier comment:

Don't reinvent the wheel over logic that Ansible has already painstakingly implemented
Less code to maintain (i.e. less tests to write and smaller code changes re: @codylane's arguments)
There is already precedent to use Ansible library code in this code-base.

My opinion may be in the minority. I'm curious to hear others' thoughts on the matter.

pytest-dev / pytest-testinfra

Testinfra is choosing Upstart for CentOS 6 as service provider #243

A different context of thought: