Add ability to inspect remote system's state

Ichimonji10 commented 8 years ago

It'd be nice if we could inspect the state of remote systems under test. For example, it may be useful to log in to a remote system and determine whether certain files are lying around in certain places in the file system.

@bmbouter and others may be able to provide more info about this issue.

bmbouter commented 8 years ago

For example with the puppet install distributor, the only way to verify that it worked correctly is make assertions on the files it writes. It has to do with the files contents, permissions, and attributes. For example the parent folder must have the right SELinux permissions, and pulp shouldn't be allow to write to it if now with SELinux enabled. IIRC, the folder also needs to exist. Once the install distributor writes the files you can assert on the contents with a hash, the POSIX permissions and ownership, and its SELinux attributes.

dkliban commented 8 years ago

Paramiko[0] is a python library that will allow us to SSH to the remote machines running Pulp.

pulp-smash will determine the hostname of the remote machine by parsing the "base_url" attribute of the pulp-smash config.
paramiko will read the ~/.ssh/ssh_config and determine the proper key to use for connecting to the hostname derived in step 1.
Utility methods that inspect the state of the remote filesystem will use the credentials from step 2 to make connection to the remote machine.

[0] http://paramiko-docs.readthedocs.org

Ichimonji10 commented 8 years ago

FYI, it's possible to fetch a username from the ssh configuration file too. Here's a snippet from my ~/.ssh/config:

Host 192.168.0.1 pine.ichimonji10.name
    User ichimonji10
    IdentityFile ~/.ssh/pine.ichimonji10.name

Question: can this design respect other SSH set-ups, such as ssh-agent?

Ichimonji10 commented 8 years ago

The solution suggested here would let us completely avoid expanding the Pulp Smash configuration file as discussed in #5, or at most add an optional "database": "pulp-db.example.com" section. I definitely like that.

bmbouter commented 8 years ago

The more I think about it, the more I like using ~/.ssh/config to store all of the ssh related configs and trying to connect by the server name used in the base_url attribute. You can make it connect to a machine by IP from that name, specify another username, or any other aspect of ssh config. So great! Let's do that and limit our modification of the pulp smash config file to at most the "database": "pulp-db.example.com" suggestion.

To recap, the goal with ssh-agent support is to allow the keys to be unlocked prior to running pulp smash and letting paramiko use the unlocked key. I briefly looked at the paramiko docs on ssh agent integration and it seems that we would need explicit configuration to use ssh-agent. Am I reading this right?

The feature part of ssh-agent integration is that a user's already unlocked key can be used. That's a nice to have, but optional as I see it. The important thing is that the key can be encrypted until paramiko goes to use it as specified from ~/.ssh/config. Effectively when pulp smash is run if the key is encrypted the user will be prompted to have it unlocked. I think this will allow for secure configurations, even though it wouldn't work w/ ssh-agent.

@Ichimonji10 @dkliban what do you think?

Ichimonji10 commented 8 years ago

A sizeable number of SSH configuration options are available. Here's one of the more interesting items in my ~/.ssh/config` file:

Host *.example.com,!collab.example.com
    User root
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

This tells the OpenSSH client to log in as root, to automatically accept new host keys, and to send those new host keys to /dev/null, and to do this for all systems that have a hostname of *.example.com except collab.example.com. Trying to build that kind of SSH client configuration logic in to Pulp Smash would be a hellish mistake.

I spent some time toying around with paramiko, and it seems like a reasonably nice solution. There is one issue that'll be bothersome: it has terrible key handling logic. I kept getting errors like this:

paramiko.ssh_exception.SSHException: Server 'pine.ichimonji10.name' not found in known_hosts

That error was due to my ecdsa-sha2-nistp256 key not being properly handled. We can work with that, though, by using a subset of possible ECDSA key types and RSA keys.

See: paramiko/paramiko#387 and related issues.

Here's a rough and working example of how to use paramiko. I don't have time to add in support for ssh-agent right now (need to head out for an appointment), but it should get us started.

#!/usr/bin/env python
import paramiko

def main():
    paramiko.util.log_to_file('/home/ichimonji10/tmp/paramiko.log')

    # Read values from config file instead of hard-coding them into script.
    config = paramiko.SSHConfig()
    config.parse(open('/home/ichimonji10/.ssh/config'))

    # When one reads the configuration file, values are lowercased. They are not
    # renamed for use by connect().
    kwargs = config.lookup('github.com')
    if 'user' in kwargs:  # defaults to the current local username
        kwargs['username'] = kwargs.pop('user')
    if 'identityfile' in kwargs:
        kwargs['key_filename'] = kwargs.pop('identityfile')

    # Paramio refuses to connect to unknown hosts.
    client = paramiko.SSHClient()
    client.load_host_keys('/home/ichimonji10/.ssh/known_hosts')

    # Connect and disconnect.
    client.connect(**kwargs)
    client.close()

if __name__ == '__main__':
    main()

jeremycline commented 8 years ago

Not to stir the pot unnecessarily, but has Fabric[0] been considered? It is built on top of paramiko.

[0] http://www.fabfile.org/

dkliban commented 8 years ago

@jeremycline We did consider it. However, it is not python 3 compatible. We also determined that paramiko is just as easy to use.

jeremycline commented 8 years ago

@dkliban Alrighty then, carry on!

Ichimonji10 commented 8 years ago

Python 3 Wall of Superpowers

Ichimonji10 commented 8 years ago

My experience with Fabric is that it assumes too much knowledge, and there's therefore a real learning curve. For example, here's a bit of Fabric code:

    run('subscription-manager repos {0}'
        .format(' '.join(['--disable "{0}"'.format(repo) for repo in args])))

Notice that there's absolutely no mention of which system this command is being run on, or the parameters for that connection. I would expect the code to look more like this:

client = SSHClient('initialization parameters')
client.run('subscription-manager repos {}'...)

The Fabric code raises questions like:

How well does this code deal with threading? Does the run function use global state?
What configuration work needs to be done before this code is run? I can't simply walk up a call tree and look for interactions with the client object, because no such object is passed around.

It also has a built-in facility for building commands. But it struck me as half-baked: all arguments are passed in as strings with no way to even indicate outcomes like "this argument should be an integer", and no good way to build common user-facing command line components.

Paramiko has issues. But from what I've seen of it, it's straightforward. And if we want a good command-line front-end, we can make use of something like click. (Hey, that's an idea. I wonder if I can rewrite pulp_smash.__main__ with click some time?)

bmbouter commented 8 years ago

@Ichimonji10 You're snippet looks great. I've never needed to know the current user, but we could do it with getpass.getuser().

I ran into the same ecdsa-sha2-nistp256 key issue when I ran it. Is there an easy way we could have better key support with Paramiko?

bmbouter commented 8 years ago

One other thing is that a lot of users want to run this against localhost. It would be great if the base_url is localhost that it didn't even connect remotely. Does paramiko allow you to do things on a local system or a remote system in an abstracted way?

Ichimonji10 commented 8 years ago

I've never needed to know the current user, but we could do it with getpass.getuser().

Thanks for the reference to getpass.getuser. I've never read the documentation on that module before. New material. However, there's no need to even call getpass.getuser. Paramiko does that when SSHClient.connect is called.

I ran into the same ecdsa-sha2-nistp256 key issue when I ran it. Is there an easy way we could have better key support with Paramiko?

Assign someone to working on Paramiko? I think the fundamental issue here is that Paramiko has some design issues, and slicing and dicing the problem into small chunks won't get us very far. From paramiko/paramiko#387:

I am making a single ticket for this because most of the existing PRs poking at it are too limited in scope; this sort of change has a high chance for bugs and breaking backwards compatibility (intentionally or no) and I feel it needs a broadly considered update.

Ichimonji10 commented 8 years ago

One other thing is that a lot of users want to run this against localhost. It would be great if the base_url is localhost that it didn't even connect remotely. Does paramiko allow you to do things on a local system or a remote system in an abstracted way?

I don't know. That said, this sort of thing seems like a client-specific issue (i.e. our issue), not an issue for paramiko. Paramiko is an SSH handling library, and if I tell it to connect to localhost, I would expect it to connect to localhost. I would find any other behaviour surprising. Similarly, if I open up a shell and type ssh localhost, I expect to SSH in to localhost.

bmbouter commented 8 years ago

I was thinking the user needed to form the full path to read the config file, but we can just use ~/.ssh/config.

Yeah let's not pickup the fixing of paramiko/paramiko#387 Do we know which key styles do work?

Regardless of Paramiko supporting it, I'm suggesting that this would be a great feature to have. Many people will run this with localhost.

Ichimonji10 commented 8 years ago

Regardless of Paramiko supporting it, I'm suggesting that this would be a great feature to have. Many people will run this with localhost.

Aye. I agree 100%.

Yeah let's not pickup the fixing of paramiko/paramiko#387 Do we know which key styles do work?

I don't know which key types paramiko supports well. From the time I spent reading through issues on the paramiko repository, it seems RSA keys are well supported. Also, the sample script given above creates a paramiko.log file. Here's some sample output:

DEB [20151119-13:22:45.682] thr=1 paramiko.transport: kex algos:['curve25519-sha256@libssh.org', 'ecdh-sha2-nistp256', 'diffie-hellman-group14-sha1', 'diffie-hellman-group1-sha1'] server key:['ssh-dss', 'ssh-rsa'] client encrypt:['chacha20-poly1305@openssh.com', 'aes256-ctr', 'aes192-ctr', 'aes128-ctr', 'aes256-cbc', 'aes192-cbc', 'aes128-cbc', 'blowfish-cbc'] server encrypt:['chacha20-poly1305@openssh.com', 'aes256-ctr', 'aes192-ctr', 'aes128-ctr', 'aes256-cbc', 'aes192-cbc', 'aes128-cbc', 'blowfish-cbc'] client mac:['hmac-sha1', 'hmac-sha2-256', 'hmac-sha2-512'] server mac:['hmac-sha1', 'hmac-sha2-256', 'hmac-sha2-512'] client compress:['none', 'zlib', 'zlib@openssh.com'] server compress:['none', 'zlib', 'zlib@openssh.com'] client lang:[''] server lang:[''] kex follows?False

I think the exact set of keys allowed by the client are listed there.

I was thinking the user needed to form the full path to read the config file, but we can just use ~/.ssh/config.

Paramiko will reject the abbreviated form, but the os Python module will resolve relative paths for us.

Ichimonji10 commented 8 years ago

Quick update:

We chatted about this and realized that, although it's possible to use a transport mechanism directly (such as Paramiko's API or command-line SSH via the subprocess module), we really want to use something a little higher level. Ansible seems like a reasonable solution, given that it's a Python package with a proper API and it's purpose for existence is dead-simple system management.

I sat down today to play around with Ansible and see if I could make it work. Ansible needs to know which hosts to contact, and by default, it uses /etc/ansible/hosts. I decided to make a file at ~/.config/pulp_smash/hosts:

localhost              ansible_python_interpreter=/usr/bin/python2  ansible_connection=local
pine.ichimonji10.name  ansible_python_interpreter=/usr/bin/python2

I uninstalled ansible from my system as a whole, and created a suitable virtualenv like so:

virtualenv -p python2 env2
source env2/bin/activate
pip install ansible

With this, it's possible to contact remote hosts:

(env2)[ichimonji10@beech:tmp]$ ansible all --inventory-file ~/.config/pulp_smash/hosts --module-name shell --args 'echo foo'
localhost | success | rc=0 >>
foo

pine.ichimonji10.name | success | rc=0 >>
foo

The --inventory-file argument can also be specified as an environment variable, which is perfect for us. Cool! We have a clean, powerful and orthogonal separation of responsibilities.

You'll notice, however, that I explicitly have to create a Python 2 virtualenv on the local system and use a Python 2 interpreter on each target system. This is because Ansible is currently only compatible with Python 2, and it expects /usr/bin/python to be a Python 2 executable. There is significant work being done to put out Ansible v2. You can get a sense by seeing the number of issues under the v2 milestone. You can also read https://github.com/ansible/ansible/issues/10771 and https://github.com/ansible/ansible/issues/1409 for more information.

I've seen mention of the ability to use a v2 branch or otherwise get access to the v2 code. If so, that'd be awesome. But at this point, I'm not sure what's involved in using v2 code.

Ichimonji10 commented 8 years ago

Ansible looks like a great tool to help us address both this issue and #31. I've walked through the Python API page on their documentation, among others. While this the documentaiton provides just enough information to get started, it's extremely shallow.

In a bid to become more familiar with Ansible's API, I cloned the Ansible source repository and started walking through code. Unfortunately, the raw source code does not reflect how Ansible is used. What do I mean by that? Here's the beginning of a Python script that uses Ansible:

import ansible.runner

runner = ansible.runner.Runner(

This code is valid, despite the fact that there is no Runner class defined in Ansible, nor is there a runner module. The closest I can find is class TaskExecutor in module lib.ansible.executor.task_executor:

class TaskExecutor:

    '''
    This is the main worker class for the executor pipeline, which
    handles loading an action plugin to actually dispatch the task to
    a given host. This class roughly corresponds to the old Runner()
    class.
    '''

I've been unable to find a good explanation of how the API is changing in Ansible version 2, if at all. Given that I'm left with walking through sample scripts and the source code to figure this out, perhaps the best solution is to go ahead and use Ansible 2 from the get-go. At this point, no Ansible 2 releases are available in PyPi. However, we can pop this in setup.txt:

install_requires=['ansible>=2', …]

And this in requirements.txt:

git+https://github.com/ansible/ansible.git@v2.0.0-0.6.rc1#egg=ansible

And we can manually update to newer tags as time goes on. Not as elegant as I'd like, but doable.

bmbouter commented 8 years ago

@Ichimonji10 I'm ok with going with Ansible 2 given that it's at the rc stage already. You're proposal of including it using setup.py and requirements.txt changes sounds good to me.

What do others think?

Ichimonji10 commented 8 years ago

Here's two third-party resources on the changes in Ansible v2:

What's New in v2 (newer version →) V2 and beyond
Use the source, Luke - what's coming up in Ansible v2

Ichimonji10 commented 8 years ago

Aaaand rc2 just came out. With a V2 API example!

dkliban commented 8 years ago

Yeah, I think that targeting ansible 2 is completely appropriate.

Ichimonji10 commented 8 years ago

The RC2 documentation isn't posted to docs.ansible.com at this point, and they're a bit of a pain to generate (I should probably submit a PR, hunh), so I'm pasting the full v2 API example from the RC2 documentation here:

In 2.0 things get a bit more complicated to start, but you end up with much more discrete and readable classes:

#!/usr/bin/python2

from collections import namedtuple
from ansible.parsing.dataloader import DataLoader
from ansible.vars import VariableManager
from ansible.inventory import Inventory
from ansible.playbook.play import Play
from ansible.executor.task_queue_manager import TaskQueueManager

Options = namedtuple('Options', ['connection','module_path', 'forks', 'remote_user', 'private_key_file', 'ssh_common_args', 'ssh_extra_args', 'sftp_extra_args', 'scp_extra_args', 'become', 'become_method', 'become_user', 'verbosity', 'check'])
# initialize needed objects
variable_manager = VariableManager()
loader = DataLoader()
options = Options(connection='local', module_path='/path/to/mymodules', forks=100, remote_user=None, private_key_file=None, ssh_common_args=None, ssh_extra_args=None, sftp_extra_args=None, scp_extra_args=None, become=None, become_method=None, become_user=None, verbosity=None, check=False)
passwords = dict(vault_pass='secret')

# create inventory and pass to var manager
inventory = Inventory(loader=loader, variable_manager=variable_manager, host_list='localhost')
variable_manager.set_inventory(inventory)

# create play with tasks
play_source =  dict(
        name = "Ansible Play",
        hosts = 'localhost',
        gather_facts = 'no',
        tasks = [ dict(action=dict(module='debug', args=(msg='Hello Galaxy!'))) ]
    )
play = Play().load(play_source, variable_manager=variable_manager, loader=loader)

# actually run it
tqm = None
try:
    tqm = TaskQueueManager(
              inventory=inventory,
              variable_manager=variable_manager,
              loader=loader,
              options=options,
              passwords=passwords,
              stdout_callback='default',
          )
    result = tqm.run(play)
finally:
    if tqm is not None:
        tqm.cleanup()

Ichimonji10 commented 8 years ago

If you want to build the documentation for Ansible v2.0.0-0.7.rc7 under Python 3, see https://github.com/ansible/ansible/issues/13463.

Ichimonji10 commented 8 years ago

The sample code given above is helpful, but I found it to be slightly complicated. I had particular trouble understanding the Options namedtuple (Why should I need to provide this magic structure when it's not needed at the CLI? Is there a declaration in the code itself?) and why some options are present, such as the stdout_callback, passwords and duplicate 'localhost' refeferences. Here's my own sample script that I came up with today:

#!/usr/bin/env python
"""Execute the "ping" module on all hosts in an inventory file."""
from ansible.executor.task_queue_manager import TaskQueueManager
from ansible.inventory import Inventory
from ansible.parsing.dataloader import DataLoader
from ansible.playbook.play import Play
from ansible.vars import VariableManager
from collections import namedtuple

Options = namedtuple('Options', [
    'become',
    'become_method',
    'become_user',
    'check',
    'connection',  # 'smart'
    'forks',  # 5
    'module_path',
    'private_key_file',
    'remote_user',
    'scp_extra_args',  # ''
    'sftp_extra_args',  # ''
    'ssh_common_args',  # ''
    'ssh_extra_args',  # ''
    'verbosity',  # 0
])

def main():
    """Run the "ping" module on all hosts."""
    loader = DataLoader()
    variable_manager = VariableManager()
    inventory = Inventory(loader=loader, variable_manager=variable_manager)
    variable_manager.set_inventory(inventory)
    play = Play().load(
        {
            'name': 'my test play',
            'tasks': [{'action': {'module': 'ping'}}],
        },
        loader=loader,
        variable_manager=variable_manager,
    )

    task_qm = None
    try:
        task_qm = TaskQueueManager(
            inventory=inventory,
            variable_manager=variable_manager,
            loader=loader,
            options=Options(
                None,  # become
                None,  # become_method
                None,  # become_user
                None,  # check
                'ssh',  # connection
                None,  # forks
                None,  # module_path
                None,  # private_key_file
                None,  # remote_user
                None,  # scp_extra_args
                None,  # sftp_extra_args
                None,  # ssh_common_args
                None,  # ssh_extra_args
                None,  # verbosity
            ),
            passwords=None,
        )
        task_qm.run(play)
    finally:
        if task_qm is not None:
            task_qm.cleanup()

if __name__ == '__main__':
    main()

In order to run this script, I needed an inventory file. I placed mine at ~/.config/pulp_smash/hosts:

localhost              ansible_python_interpreter=/usr/bin/python2  ansible_connection=local
pine.ichimonji10.name  ansible_python_interpreter=/usr/bin/python2

That done, I installed Ansible into a virtualenv and called the script:

virtualenv --python python2 env2
source env2/bin/activate

git clone git@github.com:ansible/ansible.git
cd ansible
git checkout v2.0.0-0.7.rc2
pip install --editable .
git submodule update --init --recursive

ANSIBLE_INVENTORY=~/.config/pulp_smash/hosts /path/to/script.py

The result:

(env2)[ichimonji10@beech:tmp]$ ANSIBLE_INVENTORY=~/.config/pulp_smash/hosts ./test.py 

PLAY [my test play] ************************************************************

TASK [setup] *******************************************************************
ok: [localhost]
ok: [pine.ichimonji10.name]

TASK [ping] ********************************************************************
ok: [localhost]
ok: [pine.ichimonji10.name]

This does not work on Python 3. I encountered some syntax errors preventing that. I may submit some additional PRs to the Ansible respository to fix them if this experiment continues to go well.

bmbouter commented 8 years ago

@Ichimonji10 the example above looks good. To me, it demonstrates that ansible is a viable tool for interaction w/ local and remote systems. Are there any other blockers or information we need to have before building out a solution to this issue?

How does this decision impact the dependencies of pulp smash? Will a change be required in that area?

Ichimonji10 commented 8 years ago

How does this decision impact the dependencies of pulp smash? Will a change be required in that area?

Yes, Pulp Smash's dependencies will change. I think this comment covers it pretty well:

we can pop this in setup.txt:
install_requires=['ansible>=2', …]
And this in requirements.txt:
git+https://github.com/ansible/ansible.git@v2.0.0-0.6.rc1#egg=ansible

I think we can continue with this solution. I'm not excited about Ansible's Python API. It's a little bit hideous, really. But I think we can deal with it.

Ichimonji10 commented 8 years ago

To me, it demonstrates that ansible is a viable tool for interaction w/ local and remote systems.

Yes, exactly. It demonstrates that I can communicate with my local system without SSH, and that I can communicate with a remote system over SSH with an ECDSA key.

bmbouter commented 8 years ago

@Ichimonji10 great! I've got some reconnect tests that I want to write, so I'm anxious to use it. I'm blocked until its resolved. Thanks for working on this!

Ichimonji10 commented 8 years ago

Thanks for your patience.

Ichimonji10 commented 8 years ago

By default, Ansible contacts all hosts listed in the given inventory file. However, the script given can be modified so that Ansible only contacts explicitly named hosts. All you need to do is change how the Play object is instantiated. For example:

    play = Play().load(
        {
            'hosts': ['localhost', 'pine.ichimonji10.name'],  # add this line
            # 'hosts': ['pine.ichimonji10.name'],  # this form also legal
            # 'hosts': 'pine.ichimonji10.name',  # and this too
            'name': 'my test play',
            'tasks': [{'action': {'module': 'ping'}}],
        },
        loader=loader,
        variable_manager=variable_manager,
    )

This is great, because it means we can programmatically select which hosts to contact via hostname. Which hostnames do we want to select from an inventory file? As it happens, the Pulp Smash settings file (typically ~/.config/pulp_smash/settings.json) includes the hostname of the system being tested. The hostname is bured in the base_url line:

"base_url": "https://192.168.121.139",

So it's easy to ask the user to create an Ansible inventory file that includes all of the Pulp servers we might need to contact, and settings for contacting them, and advise the user to set the ANSIBLE_INVENTORY environment variable if the inventory file is in a non-standard location (not /etc/ansible/hosts). That done, Pulp Smash can be used like so:

# simple case, using /etc/ansible/hosts and ~/.config/pulp_smash/settings.json
python -m unittest2 discover pulp_smash.tests

# A more complicated case. The pulp smash config file states where the
# pulp server is, credentials for talking to it, etc. The inventory file
# defines whether we should log in to the system hosting pulp via ssh or
# via a local shell, where the python2 interpreter is on that system, etc.
PULP_SMASH_CONFIG_FILE=pulp-2.6.json \
ANSIBLE_INVENTORY=~/.config/pulp_smash/hosts \
python -m unittest2 discover pulp_smash.tests

bmbouter commented 8 years ago

This looks great! Does ansible handle abstracting the OS (ie: upstart vs systemd) or do we have to figure out a solution that? Also are we putting these operations in a playbook or writing them in code?

Ichimonji10 commented 8 years ago

The level of abstraction available depends on the module used. The service module abtracts away init systems. From ansible-doc service:

Controls services on remote hosts. Supported init systems include BSD init, OpenRC, SysV, Solaris SMF, systemd, upstart.

I don't know whether we'll put operations in playbooks or write them in code.

(As a reminder, when you execute ansible all --module-name ping, the module is "ping". Modules can be written in any language, but the ones shipped with Ansible itself are all written in Python 2.4+.)

Ichimonji10 commented 8 years ago

Unfortunately, the core service module does not provide the ability to inspect the state of a service. Instead, it lets you declare the state a service should be in. See: http://docs.ansible.com/ansible/service_module.html

Ichimonji10 commented 8 years ago

I have a branch in which I list Ansible 2.0.0-0.7.rc2 as a dependency and use Ansible's Python API to implement an execute_command function. See: https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:ansible Everything works, as is noted by the commit message in that branch:

Add in a hacky bit of logic to execute commands on a remote system. Sample usage:

>>> from pulp_smash.config import ServerConfig
>>> cfg = ServerConfig(base_url='localhost')
>>> from pulp_smash.utils import execute_command
>>> execute_command('echo foo', cfg)
0
>>> execute_command('ls /etc', cfg)
0
>>> execute_command('ls /foo', cfg)
2

That said, I'm going to walk back from using Ansible for now, and see if simply using system shells and SSH via the subprocess module will suffice. To understand why, it's worth reviewing the benefits I had in mind when exploring Ansible, and the costs associated with using Ansible.

The benefits are as follows:

Ansible provides a mechanism for transparently managing systems via both SSH and a local shell. (And others too.)
Ansible is easy to use. Its basic usage via the CLI is dead simple, and it has a moderate learning curve. It is likely that its API is similarly simple, given that the CLI is implemented through the API.
Pulp already contains Ansible playbooks and roles. If Pulp Smash makes use of Ansible too, we might be able to re-use some of that code.
Ansible offers some abstractions, such as the ability to manage services regardless of whether the backing init system is System V, upstart or systemd.

Of these four benefits, only one has panned out. It's true that Ansible provides a mechanism for transparently managing systems via both SSH and a local shell. All you need to do is create an inventory file listing the hosts to manage and add some settings into the inventory file. The other three haven't panned out:

Ansible's CLI and API have drastically different levels of complexity. The API documentation is flimsy, there are no discussions of the v2 API on the mailing lists, and the code itself is basically undocumented (sometimes copied-and-pasted; other times just wrong). I've found myself walking through Ansible's CLI implementation in order to reverse-engineer the API.
Pulp does already contain Ansible playbooks and roles, but I don't think we'll need to use them. We need to do things like start, stop and inspect services, and inspect files on the filesystem. Ansible playbooks are used to chain together actions on hosts, and roles are used to group hosts and assign variables in interesting ways. We don't need that.
Ansible does offer some nice abstractions in the form of modules. But the modules that are shipped with Ansible are oriented toward declaring state, not inspecting systems. You can declare that a service should be stopped or that a file should exist; you cannot ask what the current state of a service is or what the permissions on a file are. This is in line with Ansible's philosophy of crafting idempotent deployment-oriented code. You might be able to get state information anyway with facts, but should things really be this hard?

As mentioned, there are also some costs associated with using Ansible. One of the costs is drastically lower compatibility. I'd like for Pulp Smash to be compatible with Python 2 and 3, so that it can be used by as many people as possible. Using Ansible ties us to Python 2. A second cost is that it brings a higher learning curve for both users and developers. Users have to learn about Ansible inventory files and environment variables; developers have to do that, plus learn additional concepts and cope with its awful Python API. Eugh.

Ichimonji10 commented 8 years ago

From docs.python.org:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The run() function was added in Python 3.5; if you need to retain compatibility with older versions, see the Older high-level API section.

I've played with it a little bit, and subprocess.run is really cool. It makes spawning subprocesses a wonderful and easy thing. Interestingly, it looks like subprocess.run has also been around for a little while and is backported to Python 2.7 and 3.3, among others. See the subprocess.run website and PyPi page.

Ichimonji10 commented 8 years ago

Ahh, I was mistaken. The subprocess.run PyPi package is entirely unrelated to the standard library's subprocess.run function. I've walked through the source code of each and the implementations are way different.

Ichimonji10 commented 8 years ago

For my current set of experiements in tackling this issue, see https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:systems

I've added some musings to the commit message on that branch.

peterlacko commented 8 years ago

Some my thoughts on this issue:

with ansible I generally agree, no python 3 support and even simple things (ie. getting command's stdout) are quite difficult to achieve through api + neccessity of requiring more config files (inventory, ansible.cfg)
local vs. remote commands: we should take into account also multinode testing (is there even need for it, @bmbouter ?). If there will be multiple nodes specified in config file, we might want to set it differently for each machine. So in settings.json file we could have named nodes with it's hostname/ip and process would be executed based on it's value (locally or over ssh). (Given that we still can ssh into localhost, we can also can 'force' user to explicitly state it).
subprocess looks good to me, elegant and simple solution and I would go with default ssh_config, as @Ichimonji10 writes in commit message

Ichimonji10 commented 8 years ago

we should take into account also multinode testing

Definitely, multi-node testing is something we'll want to support. I think we can support multi-node testing by doing two things:

Define section names that Pulp Smash will look for in the config file.
Define a new "connection" setting.

It would look like this:

{
    "default": {
        "auth": ["admin", "admin"],
        "base_url": "https://192.168.121.84",
        "verify": false
    },
    "database": {
        "base_url": "https://192.168.121.85",
        "connection": "local",
    },
    "webserver": {
        "base_url": "https://192.168.121.86",
        "connection": "ssh",
    }
}

How does this work? When Pulp Smash needs to execute commands on a system, it looks in the config file for a specially named section like "database" or "broker". If that section is found, it is used, and otherwise, the "default" section is used. Once a section has been read (as a ServerConfig object), the "connection" attribute is inspected. If the user has explicitly named a connection type like "local" or "ssh" (or "paramiko", etc), that connection type is used. Otherwise, Pulp Smash can guess which connection type to use by looking at the hostname. If the hostname is "localhost" or matches the current system's hostname, it can be inferred that a "local" connection is desired, and otherwise, it can be inferred that an "ssh" connection is desired. Finally, if an "ssh" connection is desired, then Pulp Smash can read connection settings from an SSH configuration file in one of the standard XDG config directories. This gives the user a great deal of flexibility. They can either re-use their existing SSH config file:

ln -s ~/.ssh/config ~/.config/pulp_smash/ssh_config`

Or they can create an isolated SSH config file:

vim ~/.config/pulp_smash/ssh_config

Ichimonji10 commented 8 years ago

See https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:cli

Sample usage:

>>> from pulp_smash import cli, config
>>> server_config = config.ServerConfig('localhost')
>>> client = cli.Client(server_config)
>>> response = client.run(('echo', '-n', 'foo'))
>>> response.returncode == 0
True
>>> response.stdout == 'foo'
True
>>> response.stderr == ''
True

Ichimonji10 commented 8 years ago

I'll put together the unit tests tomorrow and, hopefully, merge it.

pulp / pulp-smash

Add ability to inspect remote system's state #32