robgjansen / onionperf

A utility to track the performance of Tor and Tor onion services
Other
19 stars 24 forks source link

TGen server dies after roughly 24 hours for no obvious reason #17

Closed kloesing closed 8 years ago

kloesing commented 8 years ago

My OnionPerf instance ran for about 24 hours before outputting the following warnings and terminating:

2016-03-10 12:19:10 1457612350.177991 [onionperf] [INFO] press CTRL-C for graceful shutdown...
2016-03-10 13:14:18 1457615658.720360 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.730137 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.737811 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.743993 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.750354 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.756135 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:19 1457615659.761863 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:20 1457615660.768629 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:21 1457615661.776609 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:21 1457615661.783704 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:14:21 1457615661.789633 [onionperf] [WARNING] command '/home/ubuntu/shadow/src/plugin/shadow-plugin-tgen/build/tgen /home/ubuntu/run/onionperf-data/tgen-server/tgen.graphml.xml' finished before expected
2016-03-10 13:19:10 1457615950.259332 [onionperf] [WARNING] tgen_server_watchdog is dead!
2016-03-10 13:19:10 1457615950.259518 [onionperf] [INFO] tor_client_watchdog is alive
2016-03-10 13:19:10 1457615950.259616 [onionperf] [INFO] torctl_client_helper is alive
2016-03-10 13:19:10 1457615950.259697 [onionperf] [INFO] tgen_client_watchdog is alive
2016-03-10 13:19:10 1457615950.259775 [onionperf] [INFO] twistd_watchdog is alive
2016-03-10 13:19:10 1457615950.259852 [onionperf] [INFO] logrotate is alive
2016-03-10 13:19:10 1457615950.259937 [onionperf] [WARNING] Some parallel components failed too many times or have died :(
2016-03-10 13:19:10 1457615950.260010 [onionperf] [INFO] We are in a broken state, giving up and exiting now
2016-03-10 13:19:10 1457615950.260098 [onionperf] [INFO] Cleaning up child processes now...
2016-03-10 13:19:10 1457615950.260201 [onionperf] [INFO] Joining tgen_server_watchdog thread...
2016-03-10 13:19:10 1457615950.260302 [onionperf] [INFO] Joining tor_client_watchdog thread...
2016-03-10 13:19:10 1457615950.455452 [onionperf] [INFO] Joining torctl_client_helper thread...
2016-03-10 13:19:11 1457615951.095762 [onionperf] [INFO] Joining tgen_client_watchdog thread...
2016-03-10 13:19:11 1457615951.095964 [onionperf] [INFO] Joining twistd_watchdog thread...
2016-03-10 13:19:11 1457615951.096088 [onionperf] [INFO] Joining logrotate thread...
2016-03-10 13:19:12 1457615952.097378 [onionperf] [INFO] Child processes terminated
2016-03-10 13:19:12 1457615952.097585 [onionperf] [INFO] Child process cleanup complete!
2016-03-10 13:19:12 1457615952.097669 [onionperf] [INFO] Exiting

I don't really know how to debug this, that's why I'm pasting more details about my setup here in the hope that you can figure out.

Command to start OnionPerf:

onionperf measure --tor ~/tor/src/or/tor --tgen ~/shadow/src/plugin/shadow-plugin-tgen/build/tgen --tgen-connect-port 80 --twistd-port 50080 --no-onion --tgen-connect-ip 54.165.62.142

Tor version 0.2.7.6-dev (git-b0474572de5b9556).

Shadow commit 05fff25690ed1ea7241e2214e689ffec525d2230.

OnionPerf commit 44fb7a49faf4ab3b0c336fac57d41b032f43a795 with a manual patch of the --tgen-connect-ip code:

diff --git a/onionperf/onionperf b/onionperf/onionperf
index f876214..f062e8c 100755
--- a/onionperf/onionperf
+++ b/onionperf/onionperf
@@ -438,7 +438,7 @@ def type_str_ip_in(value):
     ip = re.match(r'[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}', s)
     if ip is None:
         raise argparse.ArgumentTypeError("IP address '%s' is not a valid address" % s)
-    return ip
+    return ip.group(0)

 # a custom action for passing in experimental data directories when plotting
 class PathStringArgsAction(argparse.Action):

Tarball with OnionPerf data directory available at https://people.torproject.org/~karsten/volatile/onionperf-data.tar.xz

robgjansen commented 8 years ago

The same thing happened on my OnionPerf instance.

The cause seems to be that the TGen server gets port-scanned and sent garbage, which causes the server process to crash. Upon trying to restart the TGen server, OnionPerf runs into the "Address already in use" error, because the old server closed non-gracefully and the old socket is still stuck in TIME_WAIT. After trying to start the TGen server 10 times in a row, OnionPerf gives up and terminates itself.

There are two updates we should make to TGen:

and maybe we should make a change in OnionPerf:

robgjansen commented 8 years ago

Note I found a log message referencing a NULL buffer in the case that a client request was not properly formed, which is why TGen was segfaulting in response to the port scanners.

That and the other changes are all in, so the server crashes should be fixed now and hopefully the server is also more stable thanks to the basic authentication I added.