ohmu / poni

poni - a systems configuration software
http://melor.github.com/poni/
Apache License 2.0
44 stars 18 forks source link

deploy is much slower than it needs to be #7

Closed fingon closed 13 years ago

fingon commented 13 years ago

deploy checks existing files by reading them from the remote server; however, in case of bigger files, this can take forever.

Trivial solution would be to use e.g. md5 hashes of the files instead, as md5sum is available pretty much everywhere - and if not, fall back to reading stuff.

I have a proof of concept patch that does this written already, but it doesn't do fallback as nicely as it could be done.

(We're transferring .tar.gz or two to enable installation of stuff based just on poni resources, and paramiko seems to do only ~10MB/minute for some reason during the check phase.)

melor commented 13 years ago

For bigger files it might be enough to (optionally, defined per file) to revert to just checking the file size and last modification timestamp. However the latter is currently not stored anywhere.

Implementing remote checksumming in a portable fashion can be a bit of a challenge.

melor commented 13 years ago

Also, the add_dir(source, dest) method can already be used (instead of add_file) if all the files to be copied are in a directory. In that case copying each file under the directory will only happen if the file's size or mtime differs from the source file.

fingon commented 13 years ago

Ah, thanks, didn't know that add_dir was more efficient. I guess we'll be using that one then ;-)

The underlying problem is use of paramiko for file transfer though; is there a reason just normal ssh commands are not used for that? Pretty much any UNIX box has them, and those are two, three orders of magnitude more efficient.

melor commented 13 years ago

Paramiko is used because it integrates much more easily into a python program and an external ssh command. There actually is another implementation in the poni sources (rcontrol_openssh.py, does not work atm!) that uses external ssh, but I never quite finished working around some issues I had getting it work properly.

Paramiko is indeed a lot slower than e.g. openssh, but not even by an order of magnitude in a quick test I just ran:

66MB of debian packages deployed (copied over) with "poni deploy" took about 7-8 seconds. Note that this includes all the internal poni overhead of checking if the file needs to be copied etc.

The same set of files copied by running "scp" took 2.0-2.3 seconds. The env for these tests was between two Linux VMs, 10GE interconnects, ssd to ssd.

fingon commented 13 years ago

Hmmh.

The original reason for this post was that I have a case where it takes ~3-4 minutes to transfer couple of tens of megabytes of files (~40 MB if I remember right), which go in ~5 seconds with raw scp. I wrote ugly md5 hack that works on Linux->Linux and falls back to non-copy case if md5 does not work for some reason, which makes it ~instant in the typical case when the files are not changed..

I guess I can look up the sw versions if it helps, both ends are dualcore CPUs with Linux VMs with 1GE (~) office network in between.