lgirdk / boardfarm

Automated testing with python
BSD 3-Clause Clear License
21 stars 33 forks source link

make bft start faster #330

Open mattsm opened 5 years ago

mattsm commented 5 years ago

connect to device in parallel and/or improve efficiency of key reuse for ssh and other aspects of that being slow.

mbanders commented 5 years ago

After spending a few hours on this and hitting errors, I'm now trying a simple case by following a simple example:

import pexpect
import multiprocessing

# Set connection commands here:
conn_cmds = {'a': 'ssh user@ip',
             'b': 'ssh user@ip2'}

def connect(name, command, result):
    print("Running %s" % command)
    result[name] = pexpect.spawn(command)

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    return_dict = manager.dict()
    jobs = []
    for name in conn_cmds:
        p = multiprocessing.Process(target=connect, args=(name, conn_cmds[name], return_dict))
        jobs.append(p)
        p.start()
    # Wait for connections to complete
    for proc in jobs:
        proc.join()
    # Display connections
    print("\nConnetions:")
    print(return_dict)

Running the above results in an error:

Running ssh user@X
Running ssh user@Y

Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "blah.py", line 9, in connect
    result[name] = pexpect.spawn(command)
  File "<string>", line 2, in __setitem__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
    conn.send((self._id, methodname, args, kwds))
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
TypeError: expected string or Unicode object, NoneType found
    self._target(*self._args, **self._kwargs)
  File "blah.py", line 9, in connect
    result[name] = pexpect.spawn(command)
  File "<string>", line 2, in __setitem__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
TypeError: expected string or Unicode object, NoneType found

Connetions:
{}
mbanders commented 5 years ago

This explains why it will be difficult to parallelize connections. Long story short, to send an object to another process, it must be possible to pickle it. And unfortunately we cannot pickle pexpect spawn objects (which is the parent class to most of our devices):

>>> import pickle
>>> import pexpect
>>> x = pexpect.spawn("ssh user@ip")
>>> pickle.dumps(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1380, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python2.7/pickle.py", line 425, in save_reduce
    save(state)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 669, in _batch_setitems
    save(v)
  File "/usr/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/home/mikea/python-virtual-environments/env/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
mattsm commented 5 years ago

I was using threads:

import threading
import queue
import pexpect
import time
import sys

def spawn_addl_shell(ret):
    for i in range(100):
        try:
            d = pexpect.spawn('bash')
            d.sendline('echo FOO')
            d.expect_exact('echo FOO')
            d.expect_exact('FOO')

            ret.put(d)
            sys.exit()
        except pexpect.EOF:
            print("Retrying...")
    print("Failed to spawn shell after 100 tries!")

print("spawning additional workers...")
start_time = time.time()

devs = []
threads = 100

q = queue.Queue()
saved = []

for i in range(threads):
    t = threading.Thread(target=spawn_addl_shell, args=(q,))
    t.start()
    saved.append(t)

while q.qsize() != threads:
    time.sleep(0.1)

while q.qsize() != 0:
    try:
        devs.append(q.get_nowait())
    except queue.Empty:
        break

for t in saved:
    t.join()

print("spawning additional workers took %s seconds" % (time.time() - start_time))
print(len(devs))

print("spawning additional workers serially")
devs = []
for i in range(threads):
    d = pexpect.spawn('bash')
    d.sendline('echo FOO')
    d.expect_exact('echo FOO')
    d.expect_exact('FOO')
    devs.append(d)
print("spawning additional workers took %s seconds" % (time.time() - start_time))
print(len(devs))

This program is hanging though

$ python thread.py
spawning additional workers...
spawning additional workers took 2.42505192757 seconds
100
spawning additional workers serially
spawning additional workers took 31.1070640087 seconds
100
^C
$