tohojo / flent

The FLExible Network Tester.
https://flent.org
Other
431 stars 77 forks source link

Runner scalability #257

Closed tohojo closed 2 years ago

tohojo commented 2 years ago

This pull request contains a number of fixes that significantly improves the scaling of Flent when spawning a lot of runners:

The individual commits contain the details; along with the main changes listed above are various smaller fixes that turned out to be useful along the way.

With these changes it is quite feasible to run a tcp_nup test with 1000 flows on my laptop, at least as far as starting the netperf instances is concerned (whether the network can actually handle it is a different matter :) ).

dtaht commented 2 years ago

At one level I applaud. At another I kind of wish we were extracting more, directly from TCP_INFO. Do we really need ss?

tohojo commented 2 years ago

Dave Täht @.***> writes:

At one level I applaud. At another I kind of wish we were extracting more, directly from TCP_INFO. Do we really need ss?

Alternatives welcome, especially if they come with patches :)

dtaht commented 2 years ago

Tell ya what. I'll go back to coding, if you get back into politics.

tohojo commented 2 years ago

@dtaht care to take this for a spin?

As for your question about TCP_INFO, it looks like it should be feasible to integrate this as an alternative to 'ss': https://github.com/m-lab/tcp-info

dtaht commented 2 years ago

tomorrow. pst

dtaht commented 2 years ago

Oh, my aching fingers and pre-existing test scripts that did --te

flent: error: ambiguous option: --te=upload_streams=1 could match --test-payload, --test-parameter

dtaht commented 2 years ago

flent -H fremont.starlink.taht.net --socket-stats --step-size=.02 --test-parameter=download_streams=1000 -t cell-tether-1000 tcp_ndown

ERROR: Resource limit of 1024 files is too low - need at least 4012 for this test

A steer for a naive user to ulimit -n 5096 or calling it directly would be good for a naive user. ulimit is not easily discoverable.

dtaht commented 2 years ago

flent -H fremont.starlink.taht.net --socket-stats --step-size=.02 --test-parameter=download_streams=1000 -t cell-tether-1000 rrul Starting Flent 2.0.1-git-c78dac1 using Python 3.8.10. Starting rrul test. Expected run time: 70 seconds. Exception in thread Thread-13: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, self._kwargs) File "/usr/local/lib/python3.8/dist-packages/flent-2.0.1_git_c78dac1-py3.8.egg/flent/runners.py", line 523, in run pid, sts = os.waitpid(self.pid, os.WNOHANG) TypeError: an integer is required (got type NoneType) Exception in thread Thread-14: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "/usr/local/lib/python3.8/dist-packages/flent-2.0.1_git_c78dac1-py3.8.egg/flent/runners.py", line 523, in run pid, sts = os.waitpid(self.pid, os.WNOHANG) TypeError: an integer is required (got type NoneType) Exception in thread Thread-15: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python3.8/dist-packages/flent-2.0.1_git_c78dac1-py3.8.egg/flent/runners.py", line 523, in run pid, sts = os.waitpid(self.pid, os.WNOHANG) TypeError: an integer is required (got type NoneType)

dtaht commented 2 years ago

The rtt_fair test works. Don't know why the rrul test doesn't.

tohojo commented 2 years ago

Well because there was a bug, obviously ;)

Should be fixed now, and also improved the rlimit handling so it tries to raise it automatically and hints at ulimit if that fails...

dtaht commented 2 years ago

WFM. But doesn't your test suite exercise all the known tests? I know it would take a long time to complete, but...

tohojo commented 2 years ago

Nope, never did get around to having the test suite actually run the tests; it only exercises the plotters and some of the parsers (which did unearth another bug, so not completely useless).

Thanks for testing! :)

dtaht commented 2 years ago

A pleasure to fiddle with this stuff again with you.