paritytech / polkadot-stps

Polkadot Standard Transactions Per Second (sTPS) performance benchmarking
The Unlicense
14 stars 4 forks source link

Connect with multiple attempts #28

Closed ggwpez closed 2 years ago

ggwpez commented 2 years ago

Since the Kubectl process sometimes dies, the port forwarding can fail.
This tries to just re-connect in that case 10 times with a 1 second backoff.

Problem: It still fails...
Forwards are all there as far as I know:

1331479  0.0  0.0 752104 27972 pts/2    Sl   18:48   0:00  \_ kubectl port-forward Pod/alice :9944 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1331490  0.0  0.0 752104 27148 pts/2    Sl   18:48   0:00  \_ kubectl port-forward Pod/alice :9615 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1333095  0.1  0.0 752360 29132 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-0 :9944 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1333107  0.0  0.0 752104 25824 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-0 :9615 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1334410  0.2  0.0 752104 29132 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-1 :9944 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1334424  0.0  0.0 752104 26904 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-1 :9615 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1335763  0.1  0.0 752104 28584 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-2 :9944 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config
1335775  0.0  0.0 752360 26796 pts/2    Sl   18:49   0:00  \_ kubectl port-forward Pod/stps-2 :9615 --namespace zombie-2c9242d1088bcaba525c5de560d9739f --kubeconfig /home/vados/.kube/config

Re-connection attempts all fail:

...
[2022-06-10T17:53:17Z WARN  utils::shared] API client ws://127.0.0.1:34137 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error
 111)))
[2022-06-10T17:53:18Z INFO  utils::shared] Attempt #3: Connecting to ws://127.0.0.1:34137
[2022-06-10T17:53:18Z WARN  utils::shared] API client ws://127.0.0.1:34137 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error
 111)))
[2022-06-10T17:53:19Z INFO  utils::shared] Attempt #4: Connecting to ws://127.0.0.1:34137
[2022-06-10T17:53:19Z WARN  utils::shared] API client ws://127.0.0.1:34137 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error
 111)))
[2022-06-10T17:53:20Z INFO  utils::shared] Attempt #5: Connecting to ws://127.0.0.1:34137
[2022-06-10T17:53:20Z WARN  utils::shared] API client ws://127.0.0.1:34137 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error
 111)))
...

I also tested it with a python script, it indeed does not connect. Node 0 and 1 work fine.

Is it also node 2 for you that keeps failing? I did not really pay attention, but I think node 2 fails the most often @bernardoaraujor
cc @pepoviola

pepoviola commented 2 years ago

HI @ggwpez, when zombienet re-create the port-fw will show the port fixed in the ps... command. Did you try downloading the last version? I will also try on my end just to be sure. Thanks!

bernardoaraujor commented 2 years ago

attempting with Zombienet 1.2.39:

$ ./zombienet-linux version
1.2.39
$ ./zombienet-linux test  tests/stps/relay.feature
...
    ✔ alice: is up within 600 secs (84ms)
[2022-06-10T20:41:45Z INFO  utils] Checking sTPS pre-conditions (account nonces and free balances).
[2022-06-10T20:41:45Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:36617
[2022-06-10T20:41:45Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:36617
rust process exited with code 0
    ✔ alice: js-script ./utils.js with "check_pre_conditions" return is 0 within 600 secs (529ms)
[2022-06-10T20:41:46Z INFO  utils] Node 1: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-06-10T20:41:46Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:39747
[2022-06-10T20:41:46Z INFO  utils] Node 0: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-06-10T20:41:46Z INFO  utils::sender] Node 1: signing 5461 transactions
[2022-06-10T20:41:46Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:39889
[2022-06-10T20:41:46Z INFO  utils] Node 2: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-06-10T20:41:46Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:46Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error in the WebSocket handshake: i/o error: unexpected end of file

    Caused by:
        0: i/o error: unexpected end of file
        1: unexpected end of file))
child process exited
creating new port-fw for Pod/stps-2, with map 33855:9944
[2022-06-10T20:41:46Z INFO  utils::sender] Node 0: signing 5461 transactions
[2022-06-10T20:41:47Z INFO  utils::shared] Attempt #2: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:47Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error in the WebSocket handshake: i/o error: unexpected end of file

    Caused by:
        0: i/o error: unexpected end of file
        1: unexpected end of file))
[2022-06-10T20:41:48Z INFO  utils::shared] Attempt #3: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:48Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:49Z INFO  utils::shared] Attempt #4: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:49Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:50Z INFO  utils::shared] Attempt #5: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:50Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:51Z INFO  utils::shared] Attempt #6: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:51Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:52Z INFO  utils::shared] Attempt #7: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:52Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:53Z INFO  utils::shared] Attempt #8: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:53Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:54Z INFO  utils::shared] Attempt #9: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:54Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:55Z INFO  utils::shared] Attempt #10: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:55Z WARN  utils::shared] API client ws://127.0.0.1:33855 error: Rpc(Transport(Error when opening the TCP socket: Connection refused (os error 111)))
[2022-06-10T20:41:55Z INFO  utils::sender] Node 1: sending 5461 transactions in chunks of 50
[2022-06-10T20:41:55Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:39747
[2022-06-10T20:41:56Z INFO  utils::sender] Node 0: sending 5461 transactions in chunks of 50
[2022-06-10T20:41:56Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:39889
[2022-06-10T20:41:56Z ERROR utils::shared] Failed to connect to ws://127.0.0.1:33855 after 10 attempts
Error: "Failed to connect to ws://127.0.0.1:33855 after 10 attempts"
rust process exited with code 1
[2022-06-10T20:41:57Z INFO  utils::sender] Node 0: 500 txs sent in 1004 ms (498.00 /s)
[2022-06-10T20:41:57Z INFO  utils::sender] Node 1: 500 txs sent in 1134 ms (440.62 /s)
ggwpez commented 2 years ago

With the new version (I did not download it indeed :man_facepalming:) this happens:

# in the zombienet console
creating new port-fw for Pod/stps-2, with map 33895:9944

# the port forwarding
kubectl port-forward Pod/stps-2 33895:9944 --namespace zombie-34a85e99f1e5afabad29ebdc7bc26e5b --kubeconfig /home/vados/.kube/config

# But somehow the connection times out:
[2022-06-10T20:40:53Z WARN  utils::shared] API client ws://127.0.0.1:33895 error: Rpc(Transport(Error in the WebSocket handshake: i/o error: unexpected end of file

    Caused by:
        0: i/o error: unexpected end of file
        1: unexpected end of file))
child process exited
creating new port-fw for Pod/stps-2, with map 33895:9944
[2022-06-10T20:40:54Z INFO  utils::shared] Attempt #2: Connecting to ws://127.0.0.1:33895
[2022-06-10T20:40:54Z INFO  utils::sender] Node 2: signing 5461 transactions
[2022-06-10T20:41:03Z INFO  utils::sender] Node 0: sending 5461 transactions in chunks of 50
[2022-06-10T20:41:03Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:39235
[2022-06-10T20:41:03Z INFO  utils::sender] Node 1: sending 5461 transactions in chunks of 50
[2022-06-10T20:41:03Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:33855
[2022-06-10T20:41:04Z INFO  utils::sender] Node 0: 850 txs sent in 1030 ms (825.11 /s)
[2022-06-10T20:41:04Z INFO  utils::sender] Node 2: sending 5461 transactions in chunks of 50
[2022-06-10T20:41:04Z INFO  utils::shared] Attempt #1: Connecting to ws://127.0.0.1:33895
[2022-06-10T20:41:05Z INFO  utils::sender] Node 1: 700 txs sent in 1137 ms (615.63 /s)
[2022-06-10T20:41:05Z INFO  utils::sender] Node 0: 400 txs sent in 1301 ms (307.43 /s)
[2022-06-10T20:41:06Z INFO  utils::sender] Node 1: 1250 txs sent in 1027 ms (1216.56 /s)
[2022-06-10T20:41:06Z INFO  utils::sender] Node 0: 50 txs sent in 1133 ms (44.13 /s)
[2022-06-10T20:41:07Z INFO  utils::sender] Node 1: 1250 txs sent in 1155 ms (1081.56 /s)
[2022-06-10T20:41:07Z INFO  utils::sender] Node 0: 900 txs sent in 1053 ms (854.33 /s)
[2022-06-10T20:41:08Z INFO  utils::sender] Node 1: 850 txs sent in 1034 ms (821.40 /s)
[2022-06-10T20:41:09Z INFO  utils::sender] Node 1: 1050 txs sent in 1001 ms (1048.26 /s)
[2022-06-10T20:41:09Z INFO  utils::sender] Node 1: 5461 txs sent in 5665 ms (963.93 /s)
[2022-06-10T20:41:09Z INFO  utils::sender] Node 0: 450 txs sent in 1618 ms (278.03 /s)
[2022-06-10T20:41:10Z INFO  utils::sender] Node 0: 950 txs sent in 1010 ms (939.79 /s)
[2022-06-10T20:41:11Z INFO  utils::sender] Node 0: 1200 txs sent in 1025 ms (1170.17 /s)
[2022-06-10T20:41:12Z INFO  utils::sender] Node 0: 5461 txs sent in 8624 ms (633.21 /s)
[2022-06-10T20:41:34Z INFO  utils::sender] Node 0: Found 3076 transfer events, need 13307 more
[2022-06-10T20:41:34Z INFO  utils::sender] Node 1: Found 3076 transfer events, need 13307 more
[2022-06-10T20:41:42Z INFO  utils::sender] Node 1: Found 11072 transfer events, need 5311 more
[2022-06-10T20:41:42Z INFO  utils::sender] Node 0: Found 11072 transfer events, need 5311 more
[2022-06-10T20:41:49Z INFO  utils::sender] Node 0: Found 11072 transfer events, need 5311 more
[2022-06-10T20:41:49Z INFO  utils::sender] Node 1: Found 11072 transfer events, need 5311 more
[2022-06-10T20:41:57Z INFO  utils::sender] Node 0: Found 11072 transfer events, need 5311 more
[2022-06-10T20:41:57Z INFO  utils::sender] Node 1: Found 11072 transfer events, need 5311 more
Error: Rpc(RequestTimeout)
rust process exited with code 1

The connection request also times out with a python script.

bernardoaraujor commented 2 years ago

@ggwpez weird that we had different results I wonder what is different in our setups? am I missing any update on my side?

are you just running oty-connect-retry branch? zombienet 1.2.39 with docker.io/paritypr/polkadot-debug:v0.9.22 image?

bernardoaraujor commented 2 years ago

ah, nevermind... I just reproduced with the same behaviour on my setup I guess it's just non-determinism again