Tunnel connection lost, reason: socket hang up

littleskunk commented 8 years ago

Package Versions

Replace the values below using the output from npm list storj.

C:\Users\Skunk\AppData\Roaming\npm
└─┬ storjshare-cli@4.0.2
  └─┬ storj@1.2.2
    └── kad@1.5.11

Replace the values below using the output from node --version.

v4.4.2

Expected Behavior

Please describe the program's expected behavior. Include an example of your usage code in the back ticks below if applicable.

I don't know.

Actual Behavior

Please describe the program's actual behavior. Please include any stack traces or log output in the back ticks below.

At the beginning I open a tunnel connection and everything is running fine

{"level":"warn","message":"you are not bound to a public address, trying traversal strategies...","timestamp":"2016-06-14T12:06:43.102Z"}
{"level":"info","message":"you are configured to tunnel up to 3 connections","timestamp":"2016-06-14T12:06:43.106Z"}
{"level":"warn","message":"could not connect to NAT device via UPnP","timestamp":"2016-06-14T12:06:43.514Z"}
{"level":"error","message":"failed to map port for tunserver: Request failed: 500","timestamp":"2016-06-14T12:06:43.515Z"}
{"level":"warn","message":"could not connect to NAT device via UPnP","timestamp":"2016-06-14T12:06:44.102Z"}
{"level":"info","message":"node created with nodeID a18f837b8bd051fe940b87050c906090bd368dea","timestamp":"2016-06-14T12:06:44.106Z"}
{"level":"info","message":"requesting probe from nearest neighbor","timestamp":"2016-06-14T12:06:44.886Z"}
{"level":"info","message":"sending PROBE message to {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"92e572280b2ebe4b91741b0ef939bf97931f59d9\",\"lastSeen\":1465906004887}","timestamp":"2016-06-14T12:06:4
4.887Z"}
{"level":"debug","message":"queuing callback for reponse to 20a081374927bdddef8fedab0e30f54d8c919ddd","timestamp":"2016-06-14T12:06:44.922Z"}
{"level":"debug","message":"checking pending rpc callbacks for expirations","timestamp":"2016-06-14T12:06:53.111Z"}
{"level":"debug","message":"checking pending rpc callbacks for expirations","timestamp":"2016-06-14T12:07:03.117Z"}
{"level":"warn","message":"rpc call 20a081374927bdddef8fedab0e30f54d8c919ddd timed out","timestamp":"2016-06-14T12:07:03.117Z"}
{"level":"debug","message":"removing contact {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"92e572280b2ebe4b91741b0ef939bf97931f59d9\",\"lastSeen\":1465906004887}","timestamp":"2016-06-14T12:07:03.118Z"
}
{"level":"info","message":"requesting tunnelers from nearest neighbor","timestamp":"2016-06-14T12:07:03.118Z"}
{"level":"info","message":"sending FIND_TUNNEL message to {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"92e572280b2ebe4b91741b0ef939bf97931f59d9\",\"lastSeen\":1465906023118}","timestamp":"2016-06-14T1
2:07:03.118Z"}
{"level":"debug","message":"queuing callback for reponse to 53b53c5612b4ef4a2563b2c7f0979192fbd11674","timestamp":"2016-06-14T12:07:03.133Z"}
{"level":"info","message":"received valid message from {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906023828}","timestamp":"2016-06-14T12:0
7:03.828Z"}
{"level":"debug","message":"updating contact {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906023828}","timestamp":"2016-06-14T12:07:03.828Z"
}
{"level":"debug","message":"creating new bucket for contact at index 159","timestamp":"2016-06-14T12:07:03.828Z"}
{"level":"debug","message":"contact not in bucket, moving to head","timestamp":"2016-06-14T12:07:03.828Z"}
{"level":"debug","message":"checking pending rpc callback stack for 53b53c5612b4ef4a2563b2c7f0979192fbd11674","timestamp":"2016-06-14T12:07:03.860Z"}
{"level":"info","message":"sending OPEN_TUNNEL message to {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906023860}","timestamp":"2016-06-14T1
2:07:03.860Z"}
{"level":"debug","message":"queuing callback for reponse to 7dd71760f224466de2e5ca9f82d962c8911846d5","timestamp":"2016-06-14T12:07:03.882Z"}
{"level":"info","message":"received valid message from {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906023913}","timestamp":"2016-06-14T12:0
7:03.913Z"}
{"level":"debug","message":"updating contact {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906023913}","timestamp":"2016-06-14T12:07:03.913Z"
}
{"level":"debug","message":"contact already in bucket, moving to tail","timestamp":"2016-06-14T12:07:03.913Z"}
{"level":"debug","message":"checking pending rpc callback stack for 20a081374927bdddef8fedab0e30f54d8c919ddd","timestamp":"2016-06-14T12:07:03.944Z"}
{"level":"warn","message":"dropping received late response to 20a081374927bdddef8fedab0e30f54d8c919ddd","timestamp":"2016-06-14T12:07:03.944Z"}
{"level":"info","message":"received valid message from {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906024302}","timestamp":"2016-06-14T12:0
7:04.302Z"}
{"level":"debug","message":"updating contact {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906024302}","timestamp":"2016-06-14T12:07:04.302Z"
}
{"level":"debug","message":"contact already in bucket, moving to tail","timestamp":"2016-06-14T12:07:04.302Z"}
{"level":"debug","message":"checking pending rpc callback stack for 7dd71760f224466de2e5ca9f82d962c8911846d5","timestamp":"2016-06-14T12:07:04.331Z"}
{"level":"info","message":"tunnel successfully established: {\"address\":\"104.196.135.20\",\"port\":8509}","timestamp":"2016-06-14T12:07:04.693Z"}

2 minutes later I am losing my tunnel connection. I can reproduce it. If I restart my node I am losing my tunnel connection after a few minutes.

Anyway the next steps are strange. OPEN_TUNNEL with old protocol version? New connection via seed list while not connected?

{"level":"warn","message":"tunnel connection lost, reason: socket hang up","timestamp":"2016-06-14T12:09:40.623Z"}
{"level":"info","message":"sending OPEN_TUNNEL message to {\"protocol\":\"0.6.10\",\"address\":\"104.196.0.239\",\"port\":8509,\"nodeID\":\"853591e7b6fa66f7c552cf1d2f8886de70be5b33\",\"lastSeen\":1465906180623}","timestamp":"2016-06-14T12:09:40.623Z"}
{"level":"debug","message":"queuing callback for reponse to 31c4347496ee41ffd60775fb6c0f73a88a7a0af6","timestamp":"2016-06-14T12:09:40.630Z"}
{"level":"warn","message":"missing or empty reply from contact","timestamp":"2016-06-14T12:09:42.004Z"}
{"level":"debug","message":"checking pending rpc callbacks for expirations","timestamp":"2016-06-14T12:09:43.251Z"}
{"level":"warn","message":"rpc call e6b4edd5f83e43fd3711c08526b4b8ef8f713ca2 timed out","timestamp":"2016-06-14T12:09:43.251Z"}
{"level":"debug","message":"removing contact {\"protocol\":\"0.7.0\",\"address\":\"puychip.no-ip.biz\",\"port\":4000,\"nodeID\":\"322838418b875f4a1fc4cf3756173bfc8b55a1d7\",\"lastSeen\":1465906167169}","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"warn","message":"rpc call 9090848b9008637106ea607b3e4f59eff5e7100f timed out","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"debug","message":"removing contact {\"protocol\":\"0.7.0\",\"address\":\"puychip.no-ip.biz\",\"port\":4000,\"nodeID\":\"322838418b875f4a1fc4cf3756173bfc8b55a1d7\",\"lastSeen\":1465906170610}","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"warn","message":"query failed, removing contact for shortlist, reason RPC with ID `9090848b9008637106ea607b3e4f59eff5e7100f` timed out","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"debug","message":"removing contact {\"protocol\":\"0.7.0\",\"address\":\"puychip.no-ip.biz\",\"port\":4000,\"nodeID\":\"322838418b875f4a1fc4cf3756173bfc8b55a1d7\",\"lastSeen\":1465906170610}","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"debug","message":"finished iteration, handling results","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"debug","message":"shortlist is full or there are no known nodes closer to key a18f837b8bd051fe940b87050c906090bd368dea","timestamp":"2016-06-14T12:09:43.252Z"}
{"level":"debug","message":"found 34 nodes close to key a18f837b8bd051fe940b87050c906090bd368dea","timestamp":"2016-06-14T12:09:43.253Z"}
{"level":"warn","message":"rpc call 786422e9c6a41570a1de12ac8dd15a91e5ed2a75 timed out","timestamp":"2016-06-14T12:09:43.253Z"}
{"level":"debug","message":"removing contact {\"protocol\":\"0.7.0\",\"address\":\"24.192.68.128\",\"port\":7000,\"nodeID\":\"2f46dc476a33296667b414e6928f7ea5d605dfbd\",\"lastSeen\":1465906171030}","timestamp":"2016-06-14T12:09:43.253Z"}
{"level":"debug","message":"refreshing buckets farthest than closest known","timestamp":"2016-06-14T12:09:43.254Z"}
{"level":"info","message":"connected to the storj network via storj://server.ass-chiros.ch:4100/351e6ccd2dc2577bda0e301ba945b0ebe9e334bd","timestamp":"2016-06-14T12:09:43.254Z"}
{"level":"info","message":"attemting to join network via storj://138.201.13.159:4323/364d0f5a80baa1d9668befa66f1b712761657ce7","timestamp":"2016-06-14T12:09:43.254Z"}
{"level":"debug","message":"entering overlay network via {\"protocol\":\"0.7.0\",\"address\":\"138.201.13.159\",\"port\":4323,\"nodeID\":\"364d0f5a80baa1d9668befa66f1b712761657ce7\",\"lastSeen\":1465906183254}","timestamp":"2016-06-14T12:09:43.254Z"}
{"level":"debug","message":"searching for nodes close to key c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.255Z"}
{"level":"debug","message":"performing network walk for NODE c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.255Z"}
{"level":"debug","message":"starting contact iteration for key c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.255Z"}
{"level":"debug","message":"querying 593844dc7f0076a1aeda9a6b9788af17e67c1052 for key c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.256Z"}
{"level":"info","message":"sending FIND_NODE message to {\"protocol\":\"0.7.0\",\"address\":\"104.196.135.20\",\"port\":8443,\"nodeID\":\"593844dc7f0076a1aeda9a6b9788af17e67c1052\",\"lastSeen\":1465906183256}","timestamp":"2016-06-14T12:09:43.256Z"}
{"level":"debug","message":"queuing callback for reponse to 8d67d8dc9dc147e7038c96c3056e95e9baeb588b","timestamp":"2016-06-14T12:09:43.315Z"}
{"level":"debug","message":"querying 351e6ccd2dc2577bda0e301ba945b0ebe9e334bd for key c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.316Z"}
{"level":"info","message":"sending FIND_NODE message to {\"protocol\":\"0.7.0\",\"address\":\"server.ass-chiros.ch\",\"port\":4100,\"nodeID\":\"351e6ccd2dc2577bda0e301ba945b0ebe9e334bd\",\"lastSeen\":1465906183316}","timestamp":"2016-06-14T12:09:43.316Z"}
{"level":"debug","message":"queuing callback for reponse to 9fa3f651c635bdeb161dbfa3a0b3ba66428a324a","timestamp":"2016-06-14T12:09:43.329Z"}
{"level":"debug","message":"querying 34c3c42955e989a9cc43ced91250372f99437678 for key c3217e24e6ecfad2fdf5190a4e7e9d175f33cfd5","timestamp":"2016-06-14T12:09:43.329Z"}
{"level":"info","message":"sending FIND_NODE message to {\"protocol\":\"0.7.0\",\"address\":\"storj01.prybil.com\",\"port\":4000,\"nodeID\":\"34c3c42955e989a9cc43ced91250372f99437678\",\"lastSeen\":1465906183329}","timestamp":"2016-06-14T12:09:43.329Z"}
{"level":"debug","message":"queuing callback for reponse to b19d0b57060c0b5e693ad321c4470096703f42a4","timestamp":"2016-06-14T12:09:43.349Z"}
{"level":"debug","message":"updating contact {\"protocol\":\"0.7.0\",\"address\":\"138.201.13.159\",\"port\":4323,\"nodeID\":\"364d0f5a80baa1d9668befa66f1b712761657ce7\",\"lastSeen\":1465906183254}","timestamp":"2016-06-14T12:09:43.350Z"}

Tunnel connection lost more than once? I didn't see a new successfully established tunnel. Important! The GUI is getting this message as well and every time it will show an error message popup.

{"level":"warn","message":"tunnel connection lost, reason: socket hang up","timestamp":"2016-06-14T12:10:00.057Z"}

At the end storjshare crashed with an error message:

[error]  Failed to establish tunnel, reason: No tunnelers were returned

Full logfile: storj.txt

Steps to Reproduce

Please include the steps the reproduce the issue, numbered below. Include as much detail as possible.

CLI

Start node that needs a tunnel connection
Wait a few minutes until it crash
Watch for tunnel connection lost

GUI

Disable UPNP and start GUI node.
Wait a few minutes until the first tunnel connection is lost.
Watch the popup messages. You will see a few of them and always the same message.

littleskunk commented 8 years ago

Workaround: If you see the same messages you are sitting behind a Firewall. As GUI user you can enable UPNP to open the ports or switch to CLI for more workaround options. As CLI user you can enable UPNP or use fixed ports in your config file and forward them yourself.

littleskunk commented 8 years ago

Installed the new core version but I still have the same problem.

C:\Users\Skunk\AppData\Roaming\npm
└─┬ storjshare-cli@4.0.2
  └─┬ storj@1.2.3
    └── kad@1.5.11

Issue reopen but no time for panic. I will wait one day and try again. The other nodes need time to update the core version.

darkbluelight commented 8 years ago

Workaround: If you see the same messages you are sitting behind a Firewall. As GUI user you can enable UPNP to open the ports or switch to CLI for more workaround options.

I would prefer to keep UPNP deactivated in general. What ports would I need to open manually ?

littleskunk commented 8 years ago

https://storj.readme.io/docs/storjshare-troubleshooting-guide#port-forwarding

Remember as GUI user you can only use UPNP to open the random port. CLI user can set a fixed port and forward it.

44203 commented 8 years ago

@littleskunk the fix is on the remote side, so you won't see it resolved in the network until people upgrade.

The issue is that prior to this patch release, tunneled nodes were still advertising tunnels. You are able to "connect" to their advertised tunnel, because the gateway accepts websocket connections for data channels. You won't know the tunnel is not actually established until you try to send data through it and the gateway realizes you aren't using it as a data channel which is when the remote node closes the connection.

We don't have any way of determining that the opened connection is not a tunnel connection other than trying to send data through it. The problem is that when the tunnel connection closes, we immediately try to reopen it, causing this to happen in a loop.

We could, instead, use another known tunnel if the tunnel closes which should help with finding a real tunnel in the meantime.

44203 commented 8 years ago

Actually I take part of that back after re-reviewing the code. We already do try the next known tunnel if the one we are connected to encounters an error or otherwise closes.

littleskunk commented 8 years ago

My first tunnel worked very well for 2 minutes. That is the main problem in this issue.

44203 commented 8 years ago

Your first tunnel was from the one of the bridge's minions, which has been dying and restarting (@niahmiah and myself are working on improvements to the renter minion pool, and that work is being tracked in the bridge repo) - which would explain your lost connection.

If that is the primary issue here, let's close this ticket as the issue that followed has been fixed in core.

littleskunk commented 8 years ago

I agree. Lets go on with https://github.com/Storj/bridge/issues/98

storj-archived / core