When running distributed benches above a certain number of concurrent users (approx 200 per worker with 4 workers in my experience) I get the socket.timeout error below and no data is returned from the workers.
Traceback (most recent call last):
File "/home/funkload/funkload/venv/bin/fl-run-bench", line 9, in
load_entry_point('funkload==1.17.0b-20120313', 'console_scripts', 'fl-run-bench')()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/funkload/BenchRunner.py", line 732, in main
ret = distmgr.run()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/funkload/Distributed.py", line 509, in run
self._worker_results[worker] = thread.output.read()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/file.py", line 134, in read
new_data = self._read(self._DEFAULT_BUFSIZE)
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/channel.py", line 1215, in _read
return self.channel.recv(size)
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/channel.py", line 586, in recv
raise socket.timeout()
socket.timeout
Looking at Distributed.py I see that the channel timeout is hard-coded to 250s in the ThreadedExec class (line 195). If I increase this or set it to None (ie no timeout), the workers successfully return data.
I've made the "channel_timeout" configurable in the [distribute] section of the config. If not present in the config it will default to None. I've also upated the FAQ to mention this setting.
When running distributed benches above a certain number of concurrent users (approx 200 per worker with 4 workers in my experience) I get the socket.timeout error below and no data is returned from the workers.
Traceback (most recent call last): File "/home/funkload/funkload/venv/bin/fl-run-bench", line 9, in
load_entry_point('funkload==1.17.0b-20120313', 'console_scripts', 'fl-run-bench')()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/funkload/BenchRunner.py", line 732, in main
ret = distmgr.run()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/funkload/Distributed.py", line 509, in run
self._worker_results[worker] = thread.output.read()
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/file.py", line 134, in read
new_data = self._read(self._DEFAULT_BUFSIZE)
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/channel.py", line 1215, in _read
return self.channel.recv(size)
File "/home/funkload/funkload/venv/lib/python2.6/site-packages/paramiko/channel.py", line 586, in recv
raise socket.timeout()
socket.timeout
Looking at Distributed.py I see that the channel timeout is hard-coded to 250s in the ThreadedExec class (line 195). If I increase this or set it to None (ie no timeout), the workers successfully return data.
I've made the "channel_timeout" configurable in the [distribute] section of the config. If not present in the config it will default to None. I've also upated the FAQ to mention this setting.