laurentsenta / pl-github

0 stars 1 forks source link

the ping test passes consistently on local machine #13

Open laurentsenta opened 2 years ago

laurentsenta commented 2 years ago
laurentsenta commented 2 years ago

Random panics on tests:

trying a few tweaks with Piotr, we cleaned a few logs and added a sleep to identify where the ping issue comes from:


Turns out the sdk-go will panic at the same time we log the message, in the instance that is doing the sleeping:

Apr 26 14:42:20.384615  INFO   2.1270s    MESSAGE << single[001] (7ed2e0) >> 192.18.0.8 not in data subnet 16.0.0.0/16, ignoring
Apr 26 14:42:20.384652  INFO   2.1270s    MESSAGE << single[001] (7ed2e0) >> detected data network IP: 16.0.0.3/16
Apr 26 14:42:25.492001  INFO   7.2337s    MESSAGE << single[000] (794fb5) >> my listen addrs: [/ip4/16.0.0.2/tcp/38975]
Apr 26 14:42:25.641978  INFO   7.3836s    MESSAGE << single[001] (7ed2e0) >> my listen addrs: [/ip4/16.0.0.3/tcp/34021]
Apr 26 14:42:25.694571  INFO   7.4363s    MESSAGE << single[000] (794fb5) >> done dialling my peers
Apr 26 14:42:25.694582  INFO   7.4362s    MESSAGE << single[001] (7ed2e0) >> A, {QmRS9UyhNGWj5yJ2JNNya7KLH4fKEhQ8XyYzM7wvg99SiQ: [/ip4/16.0.0.2/tcp/38975]}
Apr 26 14:42:25.694832  INFO   7.4363s    MESSAGE << single[001] (7ed2e0) >> Dial peer: QmRS9UyhNGWj5yJ2JNNya7KLH4fKEhQ8XyYzM7wvg99SiQ
Apr 26 14:42:25.695027  INFO   7.4364s    MESSAGE << single[001] (7ed2e0) >> STARTED SLEEPING, {QmRS9UyhNGWj5yJ2JNNya7KLH4fKEhQ8XyYzM7wvg99SiQ: [/ip4/16.0.0.2/tcp/38975]}
Apr 26 14:42:25.696648  INFO   7.4392s      ERROR << single[001] (7ed2e0) >> panic: send on closed channel
Apr 26 14:42:25.696754  INFO   7.4393s      ERROR << single[001] (7ed2e0) >> 
Apr 26 14:42:25.696837  INFO   7.4394s      ERROR << single[001] (7ed2e0) >> goroutine 45 [running]:
Apr 26 14:42:25.696910  INFO   7.4395s      ERROR << single[001] (7ed2e0) >> github.com/testground/sdk-go/sync.(*DefaultClient).responsesWorker(0xc0000ea2a0)
Apr 26 14:42:25.696990  INFO   7.4395s      ERROR << single[001] (7ed2e0) >>  /go/pkg/mod/github.com/testground/sdk-go@v0.3.1-0.20211012114808-49c90fa75405/sync/client_conn.go:43 +0x285
Apr 26 14:42:25.697070  INFO   7.4396s      ERROR << single[001] (7ed2e0) >> created by github.com/testground/sdk-go/sync.newClient
Apr 26 14:42:25.697204  INFO   7.4398s      ERROR << single[001] (7ed2e0) >>  /go/pkg/mod/github.com/testground/sdk-go@v0.3.1-0.20211012114808-49c90fa75405/sync/client.go:118 +0x1e7
Apr 26 14:42:25.805524  INFO   7.5481s INCOMPLETE << single[001] (7ed2e0) >> 

(ubuntu server, task_id=c9k09hiel22hrtvt0ps0)

So it looks like the code might fail around line 233 and 234 here:

https://github.com/libp2p/test-plans/pull/23/commits/79390c118bd9d17f36dbf823b1a378c0554d3d4b#diff-977655f8c5144f74c731dc683fa94e37569ac56e500d35f6b31a6186d693e864R233-R234

laurentsenta commented 2 years ago

In another run:

in this run: node 000 sleeps, and node 001 panic at the same time.

Apr 26 14:50:21.556652  INFO   2.0287s    MESSAGE << single[001] (87ae7a) >> Hello friends test 001
Apr 26 14:50:21.556886  INFO   2.0290s    MESSAGE << single[001] (87ae7a) >> 127.0.0.1 not in data subnet 16.0.0.0/16, ignoring
Apr 26 14:50:21.556919  INFO   2.0290s    MESSAGE << single[001] (87ae7a) >> 192.18.0.8 not in data subnet 16.0.0.0/16, ignoring
Apr 26 14:50:21.556946  INFO   2.0291s    MESSAGE << single[001] (87ae7a) >> detected data network IP: 16.0.0.3/16
Apr 26 14:50:26.701199  INFO   7.1725s    MESSAGE << single[000] (8f6e1f) >> my listen addrs: [/ip4/16.0.0.2/tcp/34265]
Apr 26 14:50:26.770436  INFO   7.2419s    MESSAGE << single[001] (87ae7a) >> my listen addrs: [/ip4/16.0.0.3/tcp/35305]
Apr 26 14:50:26.804009  INFO   7.2755s    MESSAGE << single[001] (87ae7a) >> done dialling my peers
Apr 26 14:50:26.804284  INFO   7.2759s    MESSAGE << single[000] (8f6e1f) >> A, {QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY: [/ip4/16.0.0.3/tcp/35305]}
Apr 26 14:50:26.804433  INFO   7.2760s    MESSAGE << single[000] (8f6e1f) >> Dial peer: QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY
Apr 26 14:50:26.804559  INFO   7.2761s    MESSAGE << single[000] (8f6e1f) >> STARTED SLEEPING, {QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY: [/ip4/16.0.0.3/tcp/35305]}
Apr 26 14:50:26.807311  INFO   7.2796s      ERROR << single[001] (87ae7a) >> panic: send on closed channel
Apr 26 14:50:26.807432  INFO   7.2798s      ERROR << single[001] (87ae7a) >> 
Apr 26 14:50:26.807614  INFO   7.2800s      ERROR << single[001] (87ae7a) >> goroutine 57 [running]:
Apr 26 14:50:26.807690  INFO   7.2800s      ERROR << single[001] (87ae7a) >> github.com/testground/sdk-go/sync.(*DefaultClient).responsesWorker(0xc0002ea000)
Apr 26 14:50:26.807766  INFO   7.2801s      ERROR << single[001] (87ae7a) >>  /go/pkg/mod/github.com/testground/sdk-go@v0.3.1-0.20211012114808-49c90fa75405/sync/client_conn.go:43 +0x285
Apr 26 14:50:26.807837  INFO   7.2802s      ERROR << single[001] (87ae7a) >> created by github.com/testground/sdk-go/sync.newClient
Apr 26 14:50:26.807931  INFO   7.2803s      ERROR << single[001] (87ae7a) >>  /go/pkg/mod/github.com/testground/sdk-go@v0.3.1-0.20211012114808-49c90fa75405/sync/client.go:118 +0x1e7
Apr 26 14:50:26.917452  INFO   7.3898s INCOMPLETE << single[001] (87ae7a) >> 
Apr 26 14:50:56.804922  INFO   37.2763s    MESSAGE << single[000] (8f6e1f) >> STOPPED SLEEPING, {QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY: [/ip4/16.0.0.3/tcp/35305]}
Apr 26 14:51:01.806263  INFO   42.2778s    MESSAGE << single[000] (8f6e1f) >> FAILED CONNECT, {QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY: [/ip4/16.0.0.3/tcp/35305]}
Apr 26 14:51:01.806660  INFO   42.2783s       FAIL << single[000] (8f6e1f) >> failed to dial QmPcscAmcDrxnMuKvQzWUc9QZbMkdGv54iZJX8bmhjp3vY:
  * [/ip4/16.0.0.3/tcp/35305] dial tcp4 16.0.0.3:35305: i/o timeout
Apr 26 14:51:03.749465  INFO   deleting containers {"runner": "local:docker", "run_id": "c9k0d9qel22hrtvt0pu0", "ids": ["8f6e1fdfa5c8958f2f8e42ecfaac6b8726a5e287c1deee43bb47e12ac4ff4260", "87ae7a231f00db2fc586b10392cfb9d358064440e9c50d4177a37291e23243ff"]}
Apr 26 14:51:03.749525  INFO   deleting container  {"runner": "local:docker", "run_id": "c9k0d9qel22hrtvt0pu0", "id": "87ae7a231f00db2fc586b10392cfb9d358064440e9c50d4177a37291e23243ff"}
Apr 26 14:51:03.749538  INFO   deleting container  {"runner": "local:docker", "run_id": "c9k0d9qel22hrtvt0pu0", "id": "8f6e1fdfa5c8958f2f8e42ecfaac6b8726a5e287c1deee43bb47e12ac4ff4260"}
Apr 26 14:51:05.899545  WARN   run finished in error   {"run_id": "c9k0d9qel22hrtvt0pu0", "plan": "ping", "case": "ping", "runner": "local:docker", "instances": 2, "error": "2 nodes failed"}

(ubuntu server ?task_id=c9k0d9qel22hrtvt0pu0)

sync service:

Apr 26 14:50:26.834164  WARN    websocket closed unexpectedly: failed to read JSON message: failed to get reader: failed to read frame header: EOF