Daemon is crashing when it gets an ECONNRESET

mcronce commented 6 years ago

Package Versions

Replace the values below using the output from storjshare --version.

daemon: 5.3.0, core: 8.6.0, protocol: 1.2.0

Replace the values below using the output from node --version.

v6.7.0

Expected Behavior

The daemon should stay running through connection issues

Actual Behavior

I can't say what is causing connection issues, but the daemon is crashing with the following error:

stream.js:74
      throw er; // Unhandled stream error in pipe.
      ^
Error: read ECONNRESET
    at exports._errnoException (util.js:1036:11)
    at TCP.onread (net.js:564:26)

The logs surrounding the time of the restart (when this started happening a few days ago, I setup my container image's entrypoint to print the date after the daemon stops) don't seem to reveal much other than the fact that the daemon did indeed have to start back up:

{"level":"info","message":"routing HTTP request to proxy","timestamp":"2018-02-24T15:22:00.632Z"}
{"level":"warn","message":"no proxy with id bb1822aca1606a751686d0a1d6cc4f3df854a19a exists","timestamp":"2018-02-24T15:22:00.632Z"}
{"level":"info","message":"routing HTTP request to proxy","timestamp":"2018-02-24T15:22:03.098Z"}
{"level":"warn","message":"no proxy with id bb1822aca1606a751686d0a1d6cc4f3df854a19a exists","timestamp":"2018-02-24T15:22:03.098Z"}
{"level":"info","message":"received valid message from {\"userAgent\":\"8.6.0\",\"protocol\":\"1.2.0\",\"address\":\"173.249.13.57\",\"port\":4959,\"nodeID\":\"919e8399c853d9a8c77ad7bfbb084240be82a915\",\"lastSeen\":1518791908272}","ti
mestamp":"2018-02-24T15:22:05.564Z"}
{"level":"info","message":"received FIND_NODE from {\"userAgent\":\"8.6.0\",\"protocol\":\"1.2.0\",\"address\":\"173.249.13.57\",\"port\":4959,\"nodeID\":\"919e8399c853d9a8c77ad7bfbb084240be82a915\",\"lastSeen\":1518791908272}","timest
amp":"2018-02-24T15:22:05.566Z"}
{"level":"info","message":"replying to message to 2642cf849b3f349e2e29edd5047e531882d30115","timestamp":"2018-02-24T15:22:05.567Z"}
{"level":"info","message":"received valid message from {\"userAgent\":\"8.6.0\",\"protocol\":\"1.2.0\",\"address\":\"86.100.106.16\",\"port\":4420,\"nodeID\":\"91e8d00a1a9942ba9f3d1760814f6766bc6da189\",\"lastSeen\":1518936644054}","ti
mestamp":"2018-02-24T15:22:06.316Z"}
{"level":"info","message":"replying to message to 24b93db6028fa0761eb7af87df92230e057a1623","timestamp":"2018-02-24T15:22:06.318Z"}
{"level":"info","message":"received valid message from {\"userAgent\":\"8.5.0\",\"protocol\":\"1.2.0\",\"address\":\"37.59.47.180\",\"port\":4191,\"nodeID\":\"911bcbfb435eac2ba67018791a5d277561071fe9\",\"lastSeen\":1519301099621}","tim
estamp":"2018-02-24T15:22:06.868Z"}
{"level":"info","message":"replying to message to 9df98147eab18683554fb3693b22b66f4d3563e2","timestamp":"2018-02-24T15:22:06.870Z"}
{"level":"info","message":"received valid message from {\"userAgent\":\"8.1.0\",\"protocol\":\"1.2.0\",\"address\":\"djsz.f.dedikuoti.lt\",\"port\":4000,\"nodeID\":\"91861033810aa2935b288d24f4a1a4078029d002\",\"lastSeen\":1518677024868
}","timestamp":"2018-02-24T15:22:07.316Z"}
{"level":"info","message":"replying to message to f81cbdc6364494365ef6ec8994cc5bf30dc76c62","timestamp":"2018-02-24T15:22:07.317Z"}
{"level":"warn","message":"your address is public and traversal strategies are disabled","timestamp":"2018-02-24T15:22:13.274Z"}
{"level":"info","message":"node created with nodeID 919c11ae93f706a844330f8f19afc3b92a8c9b2c","timestamp":"2018-02-24T15:22:13.314Z"}
{"level":"info","message":"clock is synchronized with ntp, delta: 8 ms","timestamp":"2018-02-24T15:22:13.343Z"}
{"level":"info","message":"resolving contacts from https://api.storj.io","timestamp":"2018-02-24T15:22:13.345Z"}
{"level":"info","message":"Connected to bridge: https://api.storj.io","timestamp":"2018-02-24T15:22:13.510Z"}
{"level":"info","message":"Connected to bridge: https://api.storj.io","timestamp":"2018-02-24T15:22:13.510Z"}
{"level":"info","message":"attempting to join network via storj://31.129.207.14:49009/4725b8a7c867cfb26f46ad5ae40d9f8128b4b415","timestamp":"2018-02-24T15:22:13.562Z"}

It's possible that this is indicating that the node with ID f81cbdc6364494365ef6ec8994cc5bf30dc76c62 is the one with connection issues (since it's the last one printed before the daemon went down), but I can't say with confidence.

I monitor the data from the status endpoint as well as my node's contact on api.storj.io, and the impact seems to be that, since the issue started, my reputation and response time have remained perfectly constant, and no timeouts have been recorded, but I don't seem to be allocating anything, as my space used has not increased.

Steps to Reproduce

I'm very unsure about what the environmental conditions are necessary to make this happen, but the steps to reproduce are to simply let the daemon run for a while. Sometimes it drops back out after a few minutes, sometimes it's several hours. Longest seems to be about 8 hours since the issue began.

Let me know where else to look for information; I'm betting this is not everything you need, but I'm happy to add anything else.

kutyla-philipp commented 6 years ago

Same issue for me, however it doesn't come up as frequently. Usally the node runs for about 2-3 Days until the crash occurs. I'm running the docker container oreandawe/storjshare-cli.

RichardLitt commented 6 years ago

👋 Hey! Thanks for this contribution. Apologies for the delay in responding!

We've decided to rearchitect Storj, so that we can scale better. You can read more about this decision here. This means that we are entirely focused on v3 at the moment, in the storj/storj repository. Our white paper for v3 is coming very, very soon - follow along on the blog and in our Rocketchat.

As this repository is part of the v2 network, we're no longer maintaining this repository. I am going to close this for now. If you have any questions, I encourage you to jump on Rocketchat and ask them there. Thanks!

storj-archived / storjshare-daemon