Fix performance problem when writing output to file descriptors

mwilliamson / spur.py

Run commands and manipulate files locally or over SSH using the same interface

BSD 2-Clause "Simplified" License

267 stars 37 forks source link

Fix performance problem when writing output to file descriptors #35

Closed chrivers closed 7 years ago

chrivers commented 8 years ago

The _ContinuousReader class uses an incredible small output buffer (a single byte). This leads to terrible performance. Consider the following simple test program:

#!/usr/bin/python
import spur

s = spur.SshShell("localhost", port=22)
res = s.run(["echo", "connected"])
print res.output
outputfile = file("output", "w")
res = s.run(["cat", "spur/input"], stdout=outputfile)
print res.return_code, len(res.output)

If "spur/input" is a 100MB input file on the server, then this simple copy operation (localhost-to-localhost) takes several minutes. After these fixes, it takes a few seconds.

mwilliamson commented 8 years ago

You're entirely correct that the buffer is rather small, so thanks for the pull request to improve performance. However, does the code you're written pass the test suite i.e. does running tox pass? It also changes the behaviour to always buffer, when that isn't always desirable.

chrivers commented 8 years ago

Oh, I didn't realize there were tests ;-)

It turns out, I'm having some difficulty running the tests, even without modifications. It seems like I have to set TEST_SSH_USERNAME and TEST_SSH_PASSWORD, right?

It does seem like some tests are failing. I would like to run them by themselves, since tox is a bit slow to run (especially when settings those env vars).

Is there some way to run a single test, and report the results?

As for the buffering, you are right. There should probably be some kind of switch for this. Perhaps a "batch_mode" flag, or similar? Or just a buffer size parameter, which could default to 1.

mwilliamson commented 8 years ago

You can run the tests by starting a virtualenv, and installing both the package itself using setup.py and the requirements in test-requirements.txt. You can then use nosetests to run the tests, including selecting specific files or functions (I think that's covered in nose's docs, but if they aren't clear I can provide a couple of examples). Looking at the tox.ini should show those same steps.

Running make bootstrap and then make test will do those steps using whatever your default python is.

The environment variables are described in CONTRIBUTING.md

chrivers commented 8 years ago

I haven't forgotten this one, just been caught up in real-life obligations. I'm experimenting with line buffering currently. I think it will provide several advantages over the current batch-mode branch I've proposed here.

rpocase commented 8 years ago

Is there any update on this? I like to see an official fix to avoid the need to do a private fork.

mwilliamson commented 8 years ago

If anybody wants to have a go at this, they're more than welcome. I'll happily merge a pull request that implements buffering so long as it doesn't break existing cases (and has tests, etc.).

mwilliamson commented 7 years ago

Closing since there's not been any updates in a while.