skycoin / skywire

Skywire Node implementation
72 stars 45 forks source link

RPC queries cause panic during startup with no dmsg connection #1596

Closed 0pcom closed 1 year ago

0pcom commented 1 year ago

Using dmsghttp config

Just before the panic in the visor logging, from another terminal I ran skywire-cli visor info it obviously was not ready for that to happen after ~15 seconds of running

$ sudo skywire -p --loglvl debug
[2023-05-31T08:14:45.287642433-05:00] INFO [visor:config]: filepath="/opt/skywire/skywire.json"
[2023-05-31T08:14:45.28866822-05:00] INFO []: setting log level to: debug
[2023-05-31T08:14:45.289032248-05:00] INFO [visor:startup]: Begin startup. public_key=02cf226af53dc6e507dc12a7a444b76d9aa7bfbd48b5f83a8124af76e14f78e39a
[2023-05-31T08:14:45.289122124-05:00] INFO [visor:startup]: main module set to hypervisor
[2023-05-31T08:14:45.289145574-05:00] DEBUG [hypervisor]: Starting
[2023-05-31T08:14:45.289173436-05:00] DEBUG [visor]: Starting
[2023-05-31T08:14:45.28923862-05:00] DEBUG [transports]: Starting
[2023-05-31T08:14:45.289260576-05:00] DEBUG [system_survey]: Starting
[2023-05-31T08:14:45.289250827-05:00] DEBUG [event_broadcaster]: Starting
[2023-05-31T08:14:45.289250205-05:00] DEBUG [dmsg_pty]: Starting
[2023-05-31T08:14:45.289270048-05:00] DEBUG [public_autoconnect]: Starting
[2023-05-31T08:14:45.289311911-05:00] DEBUG [hypervisors]: Starting
[2023-05-31T08:14:45.289256057-05:00] DEBUG [address_resolver]: Starting
[2023-05-31T08:14:45.289371117-05:00] DEBUG [stcpr]: Starting
[2023-05-31T08:14:45.289277023-05:00] DEBUG [sky_forward_conn]: Starting
[2023-05-31T08:14:45.289275228-05:00] DEBUG [transport_setup]: Starting
[2023-05-31T08:14:45.28929148-05:00] DEBUG [launcher]: Starting
[2023-05-31T08:14:45.289293002-05:00] DEBUG [transport]: Starting
[2023-05-31T08:14:45.289305276-05:00] DEBUG [router]: Starting
[2023-05-31T08:14:45.289307125-05:00] DEBUG [ping]: Starting
[2023-05-31T08:14:45.289405141-05:00] DEBUG [dmsg_trackers]: Starting
[2023-05-31T08:14:45.289383258-05:00] DEBUG [stun_client]: Starting
[2023-05-31T08:14:45.289411359-05:00] DEBUG [event_broadcaster]: Initialized.
[2023-05-31T08:14:45.289262235-05:00] DEBUG [discovery]: Starting
[2023-05-31T08:14:45.289321039-05:00] DEBUG [cli]: Starting
[2023-05-31T08:14:45.289422781-05:00] DEBUG [dmsg]: Starting
[2023-05-31T08:14:45.289386031-05:00] DEBUG [dmsg_ctrl]: Starting
[2023-05-31T08:14:45.289384134-05:00] DEBUG [uptime_tracker]: Starting
[2023-05-31T08:14:45.289393379-05:00] DEBUG [sudph]: Starting
[2023-05-31T08:14:45.289393336-05:00] DEBUG [dmsghttp_logserver]: Starting
[2023-05-31T08:14:45.289270197-05:00] DEBUG [public_visor]: Starting
[2023-05-31T08:14:45.289397623-05:00] DEBUG [stcp]: Starting
[2023-05-31T08:14:45.289521045-05:00] DEBUG [dmsg_http]: Starting
[2023-05-31T08:14:45.290902488-05:00] DEBUG direct.NewClient [dmsg_http:direct_client]: Created Direct client.
[2023-05-31T08:14:45.290975597-05:00] DEBUG [dmsg_http:dmsgDC]: Connecting to dmsg network... public_key="02cf226af53dc6e507dc12a7a444b76d9aa7bfbd48b5f83a8124af76e14f78e39a"
[2023-05-31T08:14:45.2910309-05:00] DEBUG [dmsg_http:dmsgDC]: Discovering dmsg servers...
[2023-05-31T08:14:45.291089478-05:00] DEBUG [dmsg_http:dmsgDC]: Dialing session... remote_pk=03d5b55d1133b26485c664cf8b95cff6746d1e321c34e48c9fed293eff0d6d49e5
[2023-05-31T08:14:45.291229371-05:00] DEBUG [cli]: Initialized.
[2023-05-31T08:14:45.820939013-05:00] WARN [dmsg_http:dmsgDC]: Failed to establish session. current_backoff="5s" error="dial tcp 192.53.117.158:30083: connect: connection refused" remote_pk=03d5b55d1133b26485c664cf8b95cff6746d1e321c34e48c9fed293eff0d6d49e5
[2023-05-31T08:14:50.821378276-05:00] DEBUG [dmsg_http:dmsgDC]: Dialing session... remote_pk=0281a102c82820e811368c8d028cf11b1a985043b726b1bcdb8fce89b27384b2cb
[2023-05-31T08:14:51.040687898-05:00] WARN [dmsg_http:dmsgDC]: Failed to establish session. current_backoff="6.5s" error="dial tcp 192.53.114.142:30085: connect: connection refused" remote_pk=0281a102c82820e811368c8d028cf11b1a985043b726b1bcdb8fce89b27384b2cb
[2023-05-31T08:14:55.448496541-05:00] INFO [stun_client]: NAT Type: Restricted NAT
[2023-05-31T08:14:55.448565312-05:00] INFO [stun_client]: Public IP: 70.121.6.231:46671
[2023-05-31T08:14:55.448593771-05:00] DEBUG [stun_client]: Initialized.
[2023-05-31T08:14:57.542810164-05:00] DEBUG [dmsg_http:dmsgDC]: Dialing session... remote_pk=02a49bc0aa1b5b78f638e9189be4ed095bac5d6839c828465a8350f80ac07629c0
panic: tpM is nil

goroutine 132 [running]:
github.com/skycoin/skywire/pkg/visor.(*Visor).Overview(0xc0000a7680?)
    github.com/skycoin/skywire/pkg/visor/api.go:159 +0x667
github.com/skycoin/skywire/pkg/visor.(*Visor).Summary(0xc000516b40)
    github.com/skycoin/skywire/pkg/visor/api.go:238 +0x36
github.com/skycoin/skywire/pkg/visor.(*RPC).Summary(0xc00007c160, 0xc0002b86a8?, 0xc0004b2500)
    github.com/skycoin/skywire/pkg/visor/rpc.go:177 +0x122
reflect.Value.call({0xc00061b9e0?, 0xc000014d98?, 0x13?}, {0xe56dd6, 0x4}, {0xc000135ef8, 0x3, 0x3?})
    reflect/value.go:586 +0xb0b
reflect.Value.Call({0xc00061b9e0?, 0xc000014d98?, 0xc000507760?}, {0xc0005076f8?, 0xc0005076f8?, 0x407f85?})
    reflect/value.go:370 +0xbc
net/rpc.(*service).call(0xc00052c080, 0x0?, 0x0?, 0xc0002b85b0, 0xc00061dd80, 0x1a6e0b0?, {0xcfb680?, 0x1a9fd30?, 0xc0004844b0?}, {0xcf0f00, ...}, ...)
    net/rpc/server.go:382 +0x226
created by net/rpc.(*Server).ServeCodec
    net/rpc/server.go:479 +0x3fe
$ skywire-cli visor info
[2023-05-31T08:15:36.841681801-05:00] FATAL [skywire-cli]: Failed to connect to visor RPC or RPC method not found; is skywire running?: unexpected EOF
0pcom commented 1 year ago

Note that the issue still exists but the cause for the lack of connectivity to dmsg was solved with an update to the dmsghttp-config.json on the v1.3.10 release

0pcom commented 1 year ago

This is a duplicate of #1492 ; however the issue is much more noticeable when using a dmsghttp config in the instance that it takes a long time for the visor to connect to dmsg.

Whereas with non dmsghttp-config, the window of time during startup when this issue may manifest is only seconds, a visor may run for a prolonged time without a connection to dmsg because of a misconfigured or outdated dmsghttp-config.json