storj-archived / core

Deprecated. Implementation of the Storj v2 protocol for Node.js.
https://storj.io
Other
395 stars 95 forks source link

Bridge and Farmer are sending to many SUBSCRIBE messages #257

Closed littleskunk closed 7 years ago

littleskunk commented 8 years ago

Package Versions

Replace the values below using the output from npm list storj.

"title":"Storj Bridge",
"version":"0.9.3",
"description":"Access the Storj network using a simple REST API.",
"x-protocol-version":"0.7.1",
"x-core-version":"1.4.0"

Replace the values below using the output from node --version.

I don't know

Expected Behavior

Please describe the program's expected behavior. Include an example of your usage code in the back ticks below if applicable.

10 bridge minnions. All have the same nodeID. Bridge is uploading shards. All shards with the same opcode.

skunk@Server010:~$ grep '"0f01' onehourbridgelog.log | wc -l
2371
skunk@Server010:~$ grep '"0e01' onehourbridgelog.log | wc -l
40

I would expect a total of 60 SUBSCRIBE messages in the first 15 minutes and later 0 SUBSCRIBE messages. 10 minnions sending 3 SUBSCRIBE each. 30 for the tunnel PUBLISH messages and 30 for one opcode.

Lets take the worst case and calculate with all 81 opcodes. In that case I would expect 2460 SUBSCRIBE messages in the first 15 minutes.

Actual Behavior

Please describe the program's actual behavior. Please include any stack traces or log output in the back ticks below.

The bridge is sending more SUBSCRIBE messages than PUBLISH even after hours.

skunk@Server010:~$ grep 'sending PUBLISH' onehourbridgelog.log | wc -l
76462
skunk@Server010:~$ grep 'sending SUBSCRIBE' onehourbridgelog.log | wc -l
96000

I checked the last 3 hours to see if it is always the same number.

grep '2016-07-12T14' storj-bridge.Server012.log | grep 'sending SUBSCRIBE' | wc -l
102566
grep '2016-07-12T13' storj-bridge.Server012.log | grep 'sending SUBSCRIBE' | wc -l
111153
grep '2016-07-12T12' storj-bridge.Server012.log | grep 'sending SUBSCRIBE' | wc -l
100415

Steps to Reproduce

Please include the steps the reproduce the issue, numbered below. Include as much detail as possible.

  1. Start bridge server and keep it running for a few hours
  2. Check the logfile. Ignore the first hour. We don't want to track the startup messages.
littleskunk commented 8 years ago

I checked the complete logfile for bloom filter messages. They are looking fine.

Last message was one day old

{"level":"warn","message":"failed to get neighbor's bloom filter, reason: RPC with ID `d4c42ffb8b75ace03efd7b31f3628e3154ef742d` timed out","timestamp":"2016-07-11T16:56:38.434Z"}
grep 'bloom filter with our own' storj-bridge.Server012.log | wc -l
56
grep 'failed to get' storj-bridge.Server012.log | wc -l
38
Th3Van commented 8 years ago

Th3van@Server012.storj.dk:~# node --version v6.2.2

44203 commented 8 years ago

Haven't looked into this yet, but for starters, let's use the supported node version (^4.4.4).

Th3Van commented 8 years ago

I have downgraded server012 -> node v4.4.4 , and restarted it. Just let it run for a couple of hours, and @littleskunk should be able to re-check the log file.

littleskunk commented 8 years ago

Same problem:

grep '2016-07-16T13' Server012.log > ~/onehourlog.log
grep 'sending SUBSCRIBE' onehourlog.log | wc -l
64618
grep 'sending PUBLISH' onehourlog.log | wc -l
47844
littleskunk commented 7 years ago

Looking into my farmer logfile and every node is sending to many SUBSCRIBE messages.

Expected SUBSCRIBE messages for tunnel PUBLISH and 3 opcodes at the beginning of the logfile

{"level":"info","message":"subscribing to topic \"0e01\"","timestamp":"2016-07-31T12:18:39.332Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"207.242.114.130\",\"port\":4007,\"nodeID\":\"2ad14589a28758914c013580c7cee713fa16a353\",\"lastSeen\":1469967519338}","timestamp":"2016-07-31T12:18:39.338Z"}
{"level":"info","message":"subscribing to topic \"0e00\"","timestamp":"2016-07-31T12:18:39.458Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"207.242.114.130\",\"port\":4007,\"nodeID\":\"2ad14589a28758914c013580c7cee713fa16a353\",\"lastSeen\":1469967519459}","timestamp":"2016-07-31T12:18:39.460Z"}
{"level":"info","message":"subscribing to topic \"0f01020202\"","timestamp":"2016-07-31T12:18:39.502Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"207.242.114.130\",\"port\":4007,\"nodeID\":\"2ad14589a28758914c013580c7cee713fa16a353\",\"lastSeen\":1469967519503}","timestamp":"2016-07-31T12:18:39.503Z"}
...
{"level":"info","message":"subscribing to topic \"0f02020202\"","timestamp":"2016-07-31T12:18:42.647Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"207.242.114.130\",\"port\":4007,\"nodeID\":\"2ad14589a28758914c013580c7cee713fa16a353\",\"lastSeen\":1469967522648}","timestamp":"2016-07-31T12:18:42.648Z"}
...
{"level":"info","message":"subscribing to topic \"0f03020202\"","timestamp":"2016-07-31T12:18:45.775Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"207.242.114.130\",\"port\":4007,\"nodeID\":\"2ad14589a28758914c013580c7cee713fa16a353\",\"lastSeen\":1469967525776}","timestamp":"2016-07-31T12:18:45.776Z"}

Later my node is sending more SUBSCRIBE messages but this time without any topic.

{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470003440792}","timestamp":"2016-07-31T22:17:20.793Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"w.storj.eu\",\"port\":8168,\"nodeID\":\"c15858c5901557ede0cbf89dda343cbf77788176\",\"lastSeen\":1470003440849}","timestamp":"2016-07-31T22:17:20.850Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server021.storj.dk\",\"port\":5421,\"nodeID\":\"cf5a9c4aa7a72ecb57cf3cb15244d1b18d5fad87\",\"lastSeen\":1470003440943}","timestamp":"2016-07-31T22:17:20.943Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server019.storj.dk\",\"port\":5719,\"nodeID\":\"bf55923a1b662436f3cb20c5ab1a9a2c48cb21e1\",\"lastSeen\":1470050899690}","timestamp":"2016-08-01T11:28:19.690Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470050899724}","timestamp":"2016-08-01T11:28:19.724Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server021.storj.dk\",\"port\":5921,\"nodeID\":\"cf5a9c4aa7a72ecb57cf3cb15244d1b18d5fad87\",\"lastSeen\":1470050899785}","timestamp":"2016-08-01T11:28:19.785Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"5.9.94.67\",\"port\":4013,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470105721603}","timestamp":"2016-08-02T02:42:01.603Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"51.255.86.143\",\"port\":59439,\"nodeID\":\"a8fa8e040cdb20d760f1b520bf61ebf13936b006\",\"lastSeen\":1470105721686}","timestamp":"2016-08-02T02:42:01.686Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5513,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470105721730}","timestamp":"2016-08-02T02:42:01.731Z"}
...
`{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"littleskunk.myqnapcloud.com\",\"port\":4003,\"nodeID\":\"a33503d0d7c629d229ea216dcdedb93181cf53bb\",\"lastSeen\":1470112417131}","timestamp":"2016-08-02T04:33:37.131Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"storj.airdns.org\",\"port\":53959,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470112417168}","timestamp":"2016-08-02T04:33:37.168Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5313,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470112417204}","timestamp":"2016-08-02T04:33:37.204Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470120130282}","timestamp":"2016-08-02T06:42:10.282Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-012.storj.space\",\"port\":4184,\"nodeID\":\"fce958de031d66e01ac422321820d56ac74fcabb\",\"lastSeen\":1470120130342}","timestamp":"2016-08-02T06:42:10.342Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470120130375}","timestamp":"2016-08-02T06:42:10.375Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470120548028}","timestamp":"2016-08-02T06:49:08.029Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5113,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470120548061}","timestamp":"2016-08-02T06:49:08.061Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470120548095}","timestamp":"2016-08-02T06:49:08.095Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470120550935}","timestamp":"2016-08-02T06:49:10.935Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5313,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470120550995}","timestamp":"2016-08-02T06:49:10.995Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470120551055}","timestamp":"2016-08-02T06:49:11.055Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470120610559}","timestamp":"2016-08-02T06:50:10.559Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5313,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470120610592}","timestamp":"2016-08-02T06:50:10.592Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470120610626}","timestamp":"2016-08-02T06:50:10.626Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470123156731}","timestamp":"2016-08-02T07:32:36.731Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5413,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470123156763}","timestamp":"2016-08-02T07:32:36.763Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-012.storj.space\",\"port\":4184,\"nodeID\":\"fce958de031d66e01ac422321820d56ac74fcabb\",\"lastSeen\":1470123156795}","timestamp":"2016-08-02T07:32:36.795Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470124933565}","timestamp":"2016-08-02T08:02:13.565Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5313,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470124933625}","timestamp":"2016-08-02T08:02:13.625Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470124933685}","timestamp":"2016-08-02T08:02:13.685Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470125087175}","timestamp":"2016-08-02T08:04:47.175Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470125087209}","timestamp":"2016-08-02T08:04:47.210Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server012.storj.dk\",\"port\":5712,\"nodeID\":\"c0792c4c2dcac1582a6d01c93d1d366c5321d94e\",\"lastSeen\":1470125087244}","timestamp":"2016-08-02T08:04:47.244Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470125363415}","timestamp":"2016-08-02T08:09:23.415Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5113,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470125363449}","timestamp":"2016-08-02T08:09:23.449Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470125363508}","timestamp":"2016-08-02T08:09:23.508Z"}
...
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"STORJ-CLUS-010.storj.space\",\"port\":4156,\"nodeID\":\"ad7734e18dd1e550ff2e3b32f2beed28b3dcab5c\",\"lastSeen\":1470134959348}","timestamp":"2016-08-02T10:49:19.348Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"server013.storj.dk\",\"port\":5913,\"nodeID\":\"869025a2c19c667adc13741ac7473b7908243949\",\"lastSeen\":1470134959407}","timestamp":"2016-08-02T10:49:19.407Z"}
{"level":"info","message":"sending SUBSCRIBE message to {\"protocol\":\"0.7.1\",\"address\":\"82.220.99.82\",\"port\":4000,\"nodeID\":\"c06712294d4aea97173ec8ca473dbf298ad47478\",\"lastSeen\":1470134959440}","timestamp":"2016-08-02T10:49:19.440Z"}

Time to restart my node with loglevel debug. Lets see if I can find the trigger

littleskunk commented 7 years ago

How about a bad opcode. Lets take a bridge and send out PUBLISH messages with a bad / not implemented opcode. It could be a valid opcode but a opcode that is not used by any farmer.

As far as I know (https://github.com/Storj/core/issues/204#issuecomment-227862692) my node will try to create a bloom filter for this unknow opcode. Looks like this is not working and that is the problem. Could be duplicate to https://github.com/Storj/core/issues/204

First one test with loglevel debug. Later a second test with all available opcodes. At the beginning the bloom filter is working. With all opcodes in my config I would expect my node has all possible bloom filter and later should never send any SUBSCRIBE messages.

littleskunk commented 7 years ago

Issue confirmed and duplicate to #204. I will leave a comment in #204 and we go on with this issue.

I don't like to wait for test results and did it the other way around. Fast way to reproduce the issue. remove opcode 0f01020202 (that is the opcode the 10 bridge server are using) node is sending more than 1000 SUBSCRIBE messages in a few minutes and will never stop doing that.

44203 commented 7 years ago

I think @aleitner and I figured this one out.

The quasar implementation (which sticks pretty strictly to the paper), has a routine where if a publication is received and the recipient is not subscribed to it then it asks its neighbors for their bloom filters again to see which if they are interested in it (by sending a subscribe message). The problem is that there was a bug where in that case the recipient does not append their nodeid to the publisher list and relays it likely back to its neighbors who think that it wasn't received causing another relay loop and a shit ton of subscribe messages.

Bridges do not subscribe to anything but tunnel announcements so this happens quite often for them which I think dominoes and is causing at least 3 of the issues labeled discussion right now.

So we did a couple of things:

I'd like to release this early next week and see how the overall network chatter improves.

littleskunk commented 7 years ago

Fix confirmed. 3 SUBSCRIBE messages at the beginning and than silence :)