shadow / shadow

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.
https://shadow.github.io
Other
1.43k stars 237 forks source link

Having some problem with running scallion on top of Shadow #97

Closed anupam-das closed 11 years ago

anupam-das commented 11 years ago

I have installed Shadow as instructed in https://github.com/shadow/shadow/wiki/Preparing-your-machine. And then tried out the following sample experiment--

cd resource/scallion-hosts tar xaf tiny-m1.large.tar.xz cd tiny-m1.large scallion -i hosts.xml

But none of the clients seem to download anything. I get log entries such as ---

0:0:12:134645 [thread-0] 0:7:41:485467970 [scallion-warning] [exit5-85.1.0.0] [intercept_logv] [tor-warn] 16 connections have failed: 0:0:12:134657 [thread-0] 0:7:41:485467970 [scallion-warning] [exit5-85.1.0.0] [intercept_logv] [tor-warn] 16 connections died in state handshaking (TLS) with SSL state SSLv3 read server session ticket A in HANDSHAKE 0:0:12:139624 [thread-0] 0:7:41:569741107 [scallion-warning] [exit4-83.1.0.0] [intercept_logv] [tor-warn] control_event_bootstrap_problem() Problem bootstrapping. Stuck at 85%: Finishing handshake with first hop. (DONE; DONE; count 16; recommendation warn)

.....

0:0:12:439455 [thread-0] 0:7:46:927268593 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] 18 connections have failed: 0:0:12:439467 [thread-0] 0:7:46:927268593 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] 18 connections died in state handshaking (TLS) with SSL state SSLv3 read server session ticket A in HANDSHAKE 0:0:12:440068 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit11-113.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440092 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit11-113.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440113 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit11-113.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440127 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit11-113.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440397 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440433 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440453 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440466 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440484 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440496 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] router_choose_random_node() No available nodes when trying to choose node. Failing. 0:0:12:440509 [thread-0] 0:7:47:000000000 [scallion-warning] [nonexit1-93.1.0.0] [intercept_logv] [tor-warn] onion_extend_cpath() Failed to find node for hop 0 of our path. Discarding this circuit.

.....

0:0:9:022102 [thread-0] 0:3:2:413445313 [scallion-warning] [exit3-81.1.0.0] [intercept_logv] [tor-warn] Received http status code 404 ("Not found") from server '75.1.0.0:9112' while fetching consensus directory. 0:0:9:036443 [thread-0] 0:3:5:000000000 [scallion-message] [4uthority-75.1.0.0] [intercept_logv] [tor-notice] directory_get_from_dirserver() While fetching directory info, no running dirservers known. Will try again later. (purpose 14) 0:0:9:036495 [thread-0] 0:3:5:000000000 [scallion-message] [4uthority-75.1.0.0] [intercept_logv] [tor-notice] directory_get_from_dirserver() While fetching directory info, no running dirservers known. Will try again later. (purpose

ANY suggestions as to what is going wrong. I checked the cache_descriptor after the experiment ended and found all the 20 node's descriptor present.

robgjansen commented 11 years ago

Works fine on fedora boxes.

robgjansen commented 11 years ago

I found this warning:

[tor-warn]  180 connections died in state handshaking (TLS) with SSL state SSLv3 read server session ticket A in HANDSHAKE

Which it seems was fixed in tor-0.2.3.23-rc, according to this part of its changelog:

  o Major bugfixes (security/privacy):
       - Disable TLS session tickets. OpenSSL's implementation was giving
         our TLS session keys the lifetime of our TLS context objects, when
         perfect forward secrecy would want us to discard anything that
         could decrypt a link connection as soon as the link connection
         was closed. Fixes bug 7139; bugfix on all versions of Tor linked
         against OpenSSL 1.0.0 or later. Found by Florent Daignière.

We're determining if its fixed in tor version >= 0.2.3.23-rc

robgjansen commented 11 years ago

The problem persists, even when using Tor 0.2.3.23-rc.

robgjansen commented 11 years ago

Shadow intercepts OpenSSL's EVP_Cipher and returns after a memmove (avoiding the expensive crypt operations). This usually works fine, but there is a corner case on certain systems where the EVP_Cipher method uses the function aesni_cbc_hmac_sha1_cipher from e_aes_cbc_hmac_sha1.c. That function adds padding to the operation, which doesn't happen in our memmove call.

The missing padding prompts a Tor TLS info-level log error in tor_tls_handshake similar to the following:

[tor-info] TLS error while handshaking with "49.2.0.0": decryption failed or bad record mac (in SSL routines:SSL3_GET_RECORD:SSLv3 read certificate verify A)

The short term fix is to stop intercepting EVP_Cipher by default, and take the performance hit. A longer term solution is to intercept the lower-level cryptographic operations. This will be extremely tricky given the complexity of OpenSSL and its many supported ciphers.

jdgeddes commented 11 years ago

AES-NI is a new instruction set found on newer CPUs. It seems that if OpenSSL detects that this is available, it uses the aesni_cbc_hmac_sha1_cipher instead of aes_cbc_cipher.

One method of disabling this, mentioned here, is to simply set the run time environment variable OPENSSL_ia32cap=~0x200000200000000, which did work when I tested it.

Another more hack-ish workaround was suggested in this ticket for Tor which suggested using this patch as a temporary way to remove the AES-NI functions.

robgjansen commented 11 years ago

I just created #136 to track the long term solution (outlined above by @jdgeddes) to the aesni causing TLS encryption problem. Lets move further comments there.