vi / turnhammer

Stress-testing tool for TURN (RFC 5766) servers.
13 stars 2 forks source link

thread 'main' panicked at 'Can't bind UDP anymore #5

Open trinhxhai2000 opened 2 years ago

trinhxhai2000 commented 2 years ago

Hi. I'm testing my server and I'm trying to increase -j 800 to -j 1600 then the program stop and show up this. What can I do to be able to do the test ?. Thanks you very much.

$ sudo ./turnhammer {my server's IP}:3478 test test123 --pkt-size 275 --pps 24 -j 1600 -d 960 --force
The test would do approx 193.536 Mbit/s and consume 20643.840 megabytes of traffic
My external address: {My IP}:42627

thread 'main' panicked at 'Can't bind UDP anymore: Os { code: 24, kind: Uncategorized, message: "Too many open files" }', src/main.rs:426:66
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
vi commented 2 years ago

You need to raise maximum number of open files from 1024 to higher limit. ulimit -n 5000; turnhammer .... Raising the limit may indeed need root (or editing config files).

trinhxhai2000 commented 2 years ago

You need to raise maximum number of open files from 1024 to higher limit. ulimit -n 5000; turnhammer .... Raising the limit may indeed need root (or editing config files).

I tried `ulimit -n 5000' and the old message is gone, but the new one shows up after few minutes of running. What does it mean ?

'<unnamed>' panicked at 'attempt to add with overflow', src/main.rs:198:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
vi commented 2 years ago

Unfortunately, the second error is a bug in turnhammer, probably the same as #4. I tried to reproduce it locally before, but not managed so properly using my own coturn deployment.

Overflow checks probably mean that you are running turnhammer in debug mode - it is not recommended for real tests, as performance of turnhammeritself may limit the test results instead of network and TURN server performance. Without overlow checks you would probably just get garbage values in "bad loss" and "overall score" fields instead.

Do you have a public TURN server instance for me to start turnhammer on my side and reproduce the bug?

trinhxhai2000 commented 2 years ago

yeah how can i send you the info.

vi commented 2 years ago

To the e-mail: vi0oss@gmail.com

trinhxhai2000 commented 2 years ago

Unfortunately, the second error is a bug in turnhammer, probably the same as #4. I tried to reproduce it locally before, but not managed so properly using my own coturn deployment.

Overflow checks probably mean that you are running turnhammer in debug mode - it is not recommended for real tests, as performance of turnhammeritself may limit the test results instead of network and TURN server performance. Without overlow checks you would probably just get garbage values in "bad loss" and "overall score" fields instead.

Do you have a public TURN server instance for me to start turnhammer on my side and reproduce the bug?

About debug mode. I have been running the program turnhammer in turmhammer/target/debug/. Is it wrong ? Sorry if it so obvious but I'm not working with Rust.

vi commented 2 years ago

I have been running the program turnhammer in turmhammer/target/debug/

It is not wrong per se, but does not use optimized mode. turnhammer is a perforance-sensitive application and debug mode is typically drastically (e.g. 10x) slower that normal release mode.

You may want to build with --release flag and use executable in target/release.

trinhxhai2000 commented 2 years ago

Thank you. I have sent my server info to vi0oss@gmail.com. Can you check it out?

vi commented 2 years ago

--pkt-size 1040 --pps 200 -j 800 -d 960 --force The test would do approx 2764.800 Mbit/s and consume 294912.000 megabytes of traffic

Are you sure your client machine is ready for the test of such scale? Personally I never tried single turnhammer instance to do more than 100 MBit/s of traffic.

I expect to need e.g. about 20 AWS instances to be running turnhammer simultaneously on one server, each with only -j 40, to have more reliable test results.

turnhammer may try to do tests of any scale, but if it exceeds bandwith OS can handle then you would just get incorrect results (e.g. it would drop packets before sending them or it would just silently sent packets slower than needed).

Proper large-scale testing techique may be like this:

  1. Start one turnhammer instance, with a lightweight test. Ensure it shows high score.
  2. By gradually increasing turnhammer's parameters, find at which bandwidth score gets lower. Assume it may be so because of the client-side issues, not because of TURN-server.
  3. Dial back settings of turnhammer, so that it operates at about 70% of the bandwidth at which score is still high, but start second instance of turnhammer on a separate machine (or VM), in parallel with the first one.
  4. If score gets lower when using two turnhammers in parallel then we have indeed reached the TURN server capacity. But if each turnhammer still shows high score despite of having competing turnhammer (which also shows high score), then it means that TURN server capacity is not reached yet.
  5. Continue adding more VM instances (or physical machines) running turnhammer with the same settings until you see the sag in the measured quality.
  6. Multiply final number of instances (where scores are still OK-ish) running turnhammer by the bandwidth reported by turnhammer. That would be the actual maximum measured bandwidth of the TURN server.

You may also want to adjust socket network settings on client machine (e.g. buffer size, network interface queue length, whether to call firewall) - it may affect highest available client bandwidth for testing (thus allow finding TURN server capacity using fewer number of instances).

It may also make sense to run multiple (few) turnhammer instances on the same VM, as turnhammer is only partially multi-threaded. This may also help economizing instances, if they have more than 2 cores each.

Obviously, do not try turnhammer from the same VM/machine that is running the TURN server itself.