Closed oskarth closed 3 years ago
Alright, I tried again and can't figure it out. I don't have access to a Windows machine so the current feedback loop for me is Windows CI, which takes 50m to build. Looking at the code I don't see anything obviously wrong, and it works for Mac and Linux. The only odd thing I notice is DBG 2021-05-24 13:05:01.000+08:00 Exception in accept loop, exiting topics="libp2p switch" tid=593080 exc="Transport closed, no more connections!"
but we see this in a lot of tests, so I think this is expected.
See https://github.com/status-im/nim-waku/pull/522/checks?check_run_id=2653040184
DBG 2021-05-03 09:11:24.681+00:00 no handlers topics="libp2p multistream" tid=1244 conn=16U*qobXj7:608fbe3cc958000000004769 protocol=/vac/waku/relay/2.0.0-beta2
DBG 2021-05-03 09:11:24.681+00:00 decodeMsg: decoded message topics="libp2p identify" tid=1244 pubkey=secp25...22]))) addresses=@[/ip4/0.0.0.0/tcp/60010] protocols="@[\"/ipfs/id/1.0.0\", \"/vac/waku/lightpush/2.0.0-alpha1\"]" observable_address=Some(/ip4/127.0.0.1/tcp/60012) proto_version="Some(\"ipfs/0.1.0\")" agent_version="Some(\"nim-libp2p/0.0.1\")"
DBG 2021-05-03 09:11:24.682+00:00 decodeMsg: decoded message topics="libp2p identify" tid=1244 pubkey=secp25...36]))) addresses=@[/ip4/0.0.0.0/tcp/60012] protocols="@[\"/ipfs/id/1.0.0\", \"/vac/waku/relay/2.0.0-beta2\", \"/vac/waku/lightpush/2.0.0-alpha1\"]" observable_address=Some(/ip4/127.0.0.1/tcp/50571) proto_version="Some(\"ipfs/0.1.0\")" agent_version="Some(\"nim-libp2p/0.0.1\")"
DBG 2021-05-03 09:11:24.682+00:00 no handlers topics="libp2p multistream" tid=1244 conn=16U*qobXj7:608fbe3cc958000000004769 protocol=/vac/waku/relay/2.0.0-beta2
DBG 2021-05-03 09:11:24.683+00:00 Dial successful topics="libp2p switch" tid=1244 conn=16U*qobXj7:608fbe3cc958000000004763 peerInfo="(peerId: \"16Uiu2HAmC7qWZFpHL6bZoKPa1869xpTYFtQZ3dWm2pexegqobXj7\", addrs: @[\"/ip4/0.0.0.0/tcp/60012\"], protocols: @[\"/ipfs/id/1.0.0\", \"/vac/waku/relay/2.0.0-beta2\", \"/vac/waku/lightpush/2.0.0-alpha1\"], protoVersion: \"ipfs/0.1.0\", agentVersion: \"nim-libp2p/0.0.1\")"
DBG 2021-05-03 09:11:24.684+00:00 Could not establish send connection topics="libp2p pubsubpeer" tid=1244 msg="Unable to select sub-protocol @[\"/vac/waku/relay/2.0.0-beta2\", \"/vac/waku/relay/2.0.0-beta2\"]"
DBG 2021-05-03 09:11:24.684+00:00 unsubscribing pubsub peer topics="libp2p pubsub" tid=1244 peerId=16U*S5Qrtv
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(460) main
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(450) NimMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(441) PreMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(227) PreMainInner
D:/a/nim-waku/nim-waku/vendor/nim-unittest2/unittest2.nim(704) suitePayload
D:/a/nim-waku/nim-waku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym176890001
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(1092) waitFor
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(281) poll
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
D:/a/nim-waku/nim-waku/waku/v2/protocol/waku_lightpush/waku_lightpush.nim(172) request
D:/a/nim-waku/nim-waku/vendor/nim-libp2p/libp2p/stream/lpstream.nim(244) writeLp
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(101) write
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
D:/a/nim-waku/nim-waku/vendor/nim-libp2p/libp2p/muxers/mplex/lpchannel.nim(171) write
[[reraised from:
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(460) main
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(450) NimMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(441) PreMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(227) PreMainInner
D:/a/nim-waku/nim-waku/vendor/nim-unittest2/unittest2.nim(704) suitePayload
D:/a/nim-waku/nim-waku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym176890001
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(1092) waitFor
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(281) poll
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(77) colonanonymous
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(367) request
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncfutures2.nim(472) internalCheckComplete
]]
[[reraised from:
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(460) main
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(450) NimMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(441) PreMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(227) PreMainInner
D:/a/nim-waku/nim-waku/vendor/nim-unittest2/unittest2.nim(704) suitePayload
D:/a/nim-waku/nim-waku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym176890001
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(1092) waitFor
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(281) poll
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(367) lightpush
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncfutures2.nim(472) internalCheckComplete
]]
[[reraised from:
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(460) main
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(450) NimMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(441) PreMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(227) PreMainInner
D:/a/nim-waku/nim-waku/vendor/nim-unittest2/unittest2.nim(704) suitePayload
D:/a/nim-waku/nim-waku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym176890001
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(1092) waitFor
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncloop.nim(281) poll
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncmacro2.nim(367) scenario
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncfutures2.nim(472) internalCheckComplete
]]
[[reraised from:
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(460) main
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(450) NimMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(441) PreMain
D:/a/nim-waku/nim-waku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(227) PreMainInner
D:/a/nim-waku/nim-waku/vendor/nim-unittest2/unittest2.nim(704) suitePayload
D:/a/nim-waku/nim-waku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym176890001
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncfutures2.nim(485) read
D:/a/nim-waku/nim-waku/vendor/nim-chronos/chronos/asyncfutures2.nim(472) internalCheckComplete
]]
Unhandled exception: Stream Closed! [LPStreamClosedError]
[FAILED] Lightpush message return success
...
D:\a\nim-waku\nim-waku\vendor\nimbus-build-system\vendor\Nim\lib\system\nimscript.nim(260, 7) Error: unhandled exception: FAILED: build/all_tests_v2 [OSError]
mingw32-make: *** [Makefile:125: test2] Error 1
Error: Process completed with exit code 1.
Suggest we put this in icebox for now, and when it makes sense to revisit probably good to have a Windows environment for better debugging.
@jm-clius let me know if you have any obvious ideas of what you think is going wrong.
CI is failing because even a non-relay capable node requires relay
mounted to use lightpush
. See https://github.com/status-im/nim-waku/pull/518. The way to fix this is to add:
node1.mountRelay(relayMessage=false)
before mounting lightpush
.
In short:
if a non-relay node tries to dial a relay peer (say to perform a lightpush
), the relay peer will attempt to add that node as a relay/pubsub peer, which will fail in the protocol negotiation phase:
DBG 2021-05-24 05:35:43.636+00:00 Could not establish send connection topics="libp2p pubsubpeer" tid=6200 msg="Unable to select sub-protocol @[\"/vac/waku/relay/2.0.0-beta2\", \"/vac/waku/relay/2.0.0-beta2\"]"
This in turn causes the relay peer to remove that peer and all its connections, including the one established for lightpush
, from the switch
.
DBG 2021-05-24 05:35:43.637+00:00 unsubscribing pubsub peer topics="libp2p pubsub" tid=6200 peerId=16U*kyNBQr
When the lightpush
peer subsequently tries to send a PushRequest
it raises an Unhandled exception: Stream Closed! [LPStreamClosedError]
A bit more tricky, but comes down to timing differences. On both macos and ubuntu the order in which the identify
is received by the two peers involved are different from Windows. The effect is that they manage to send the PushRequest
just before the connection gets closed by the failing relay
negotiation described above.
DBG 2021-05-24 05:13:10.248+00:00 PushRequest topics="wakulightpush" tid=76420 pubSubTopic=test msg="(payload: @[104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100], contentTopic: \"/waku/2/default-content/proto\", version: 0, timestamp: 0.0, proof: @[])"
DBG 2021-05-24 05:13:10.248+00:00 publish topics="wakurelay" tid=76420 pubSubTopic=test message="@[10, 11, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 18, 29, 47, 119, 97, 107, 117, 47, 50, 47, 100, 101, 102, 97, 117, 108, 116, 45, 99, 111, 110, 116, 101, 110, 116, 47, 112, 114, 111, 116, 111, 24, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 170, 1, 0]"
DBG 2021-05-24 05:13:10.248+00:00 Could not establish send connection topics="libp2p pubsubpeer" tid=76420 msg="Unable to select sub-protocol @[\"/vac/waku/relay/2.0.0-beta2\", \"/vac/waku/relay/2.0.0-beta2\"]"
Subsequent messages would fail. Windows is just too slow to get that lucky message through.
After merging with master
in feature/swap-account-logs
I now get these set of errors when the CI runs:
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_hasKeyPair\",\"params\":[\"67c0e2c8de43e9a4a1e2362040a3310f560317ba9cee29e7cea7e4feb46f0a06\"],\"id\":17}"
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_deleteKeyPair\",\"params\":[\"67c0e2c8de43e9a4a1e2362040a3310f560317ba9cee29e7cea7e4feb46f0a06\"],\"id\":18}"
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_hasKeyPair\",\"params\":[\"67c0e2c8de43e9a4a1e2362040a3310f560317ba9cee29e7cea7e4feb46f0a06\"],\"id\":19}"
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_deleteKeyPair\",\"params\":[\"67c0e2c8de43e9a4a1e2362040a3310f560317ba9cee29e7cea7e4feb46f0a06\"],\"id\":20}"
DBG 2021-05-24 16:40:29.256+00:00 Error occurred within RPC tid=33954 methodName=waku_deleteKeyPair err="Invalid key id"
[OK] waku asymKey tests
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_newSymKey\",\"params\":[],\"id\":21}"
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_hasSymKey\",\"params\":[\"d68e93c9e1966d569feebe0991face4ac91da803a685eb1f95ba2898cd87147d\"],\"id\":22}"
DBG 2021-05-24 16:40:29.256+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_deleteSymKey\",\"params\":[\"d68e93c9e1966d569feebe0991face4ac91da803a685eb1f95ba2898cd87147d\"],\"id\":23}"
DBG 2021-05-24 16:40:29.257+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_hasSymKey\",\"params\":[\"d68e93c9e1966d569feebe0991face4ac91da803a685eb1f95ba2898cd87147d\"],\"id\":24}"
DBG 2021-05-24 16:40:29.257+00:00 Processing message tid=33954 address=127.0.0.1:55026 line="{\"jsonrpc\":\"2.0\",\"method\":\"waku_deleteSymKey\",\"params\":[\"d68e93c9e1966d569feebe0991face4ac91da803a685eb1f95ba2898cd87147d\"],\"id\":25}"
DBG 2021-05-24 16:40:29.257+00:00 Error occurred within RPC tid=33954 methodName=waku_deleteSymKey err="Invalid key id"
Could it be related to this issue?
Thanks @jm-clius, nice catch! I should've been able to figure that out looking at the logs a bit more carefully. The timing issue makes sense too.
Could it be related to this issue?
@EbubeUd doesn't seem to be (directly) related. Might have been a temporary resource issue, so you could try to re-run those jobs if you haven't already. In any case, let's move this discussion to that PR then.
For some reason Windows CI is failing with lightpub wakunode tests. Unclear why. Timing issue?
https://github.com/status-im/nim-waku/pull/509