skycoin / skywire

Skywire Node implementation
73 stars 45 forks source link

`skywire-cli log` hangs #1552

Closed jdknives closed 1 year ago

jdknives commented 1 year ago

skywire-cli log can hang indefinitely

The order of the fetching of the surveys and transport logs was randomized here to compensate for it

https://github.com/skycoin/skywire/pull/1541

however the issue is actually still present

the reward system log collection run can be viewed here:

https://fiber.magnetosphere.net/log-collection

The process is limited to half an hour via the timeout command, though it takes only about 10 minutes for the command to hang on a download attempt.

0pcom commented 1 year ago

solving #1143 may help to address this issue

0pcom commented 1 year ago

There can be various reasons for hanging, here I've just observed another instance of this behavior

[2023-06-21T17:11:09.27000653-05:00] DEBUG [log-collecting]: Successfully  called uptime service and received answer []
[2023-06-21T17:11:09.305589163-05:00] DEBUG disc.NewHTTP [log-collecting]: Created HTTP client. addr="http://dmsgd.skywire.skycoin.com"
[2023-06-21T17:11:09.305671584-05:00] INFO [log-collecting]: Connecting to dmsg network... dmsg_disc="http://dmsgd.skywire.skycoin.com" public_key="02b5ee5333aa6b7f5fc623b7d5f35f505cb7f974e98a70751cf41962f84c8c4637"
[2023-06-21T17:11:09.305806854-05:00] DEBUG [dmsg_client]: Discovering dmsg servers...
[2023-06-21T17:11:10.306585512-05:00] WARN retrier [dmsg_client]: Retrying... current_backoff=1.3s error="something unexpected happened"
[2023-06-21T17:11:10.702813621-05:00] DEBUG [dmsg_client]: Dialing session... remote_pk=0326978f5a53aff537dbb47fed58b1f123af3b00132d365f1309a14db4168dcff7
[2023-06-21T17:13:19.978885726-05:00] WARN [dmsg_client]: Failed to establish session. current_backoff="5s" error="dial tcp 172.105.125.167:9081: connect: connection timed out" remote_pk=0326978f5a53aff537dbb47fed58b1f123af3b00132d365f1309a14db4168dcff7
[2023-06-21T17:13:24.98211556-05:00] DEBUG [dmsg_client]: Dialing session... remote_pk=02a2d4c346dabd165fd555dfdba4a7f4d18786fe7e055e562397cd5102bdd7f8dd
[2023-06-21T17:15:34.805564656-05:00] WARN [dmsg_client]: Failed to establish session. current_backoff="6.5s" error="dial tcp 170.187.231.137:30081: connect: connection timed out" remote_pk=02a2d4c346dabd165fd555dfdba4a7f4d18786fe7e055e562397cd5102bdd7f8dd
[2023-06-21T17:15:41.311273136-05:00] DEBUG [dmsg_client]: Dialing session... remote_pk=0228af3fd99c8d86a882495c8e0202bdd4da78c69e013065d8634286dd4a0ac098
[2023-06-21T17:17:51.338872047-05:00] WARN [dmsg_client]: Failed to establish session. current_backoff="8.45s" error="dial tcp 172.104.161.184:30084: connect: connection timed out" remote_pk=0228af3fd99c8d86a882495c8e0202bdd4da78c69e013065d8634286dd4a0ac098
[2023-06-21T17:17:59.796336668-05:00] DEBUG [dmsg_client]: Dialing session... remote_pk=0281a102c82820e811368c8d028cf11b1a985043b726b1bcdb8fce89b27384b2cb
[2023-06-21T17:20:09.578866013-05:00] WARN [dmsg_client]: Failed to establish session. current_backoff="10.985s" error="dial tcp 139.162.12.244:30085: connect: connection timed out" remote_pk=0281a102c82820e811368c8d028cf11b1a985043b726b1bcdb8fce89b27384b2cb
[2023-06-21T17:20:20.568756568-05:00] DEBUG [dmsg_client]: Dialing session... remote_pk=0371ab4bcff7b121f4b91f6856d6740c6f9dc1fe716977850aeb5d84378b300a13
[2023-06-21T17:20:21.871683006-05:00] ERROR [log-collecting]: endpoint="http://dmsgd.skywire.skycoin.com/dmsg-discovery/entry/" resp_body="something unexpected happened" resp_status=500
[2023-06-21T17:20:21.871741291-05:00] WARN EntityCommon.setSession [dmsg_client]: Callback returned non-nil error. error="something unexpected happened"
[2023-06-21T17:20:21.871828978-05:00] DEBUG [dmsg_client]: Serving session. remote_pk=0371ab4bcff7b121f4b91f6856d6740c6f9dc1fe716977850aeb5d84378b300a13
0pcom commented 1 year ago

here is a panic which occurs from an io timeout error on the uptime tracker query with skywire-cli log

image

[2023-06-24T04:18:47.424765278-05:00] ERROR [log-collecting]: Error while fetching data from uptime service. Error: Get "http://ut.skywire.skycoin.com/uptimes?v=v2": dial tcp 192.46.229.215:80: i/o timeout
[2023-06-24T04:18:47.42483283-05:00] PANIC [log-collecting]: Unable to get data from uptime tracker. error="Cannot get Uptime data"
panic: (*logrus.Entry) 0xc00032e0e0

goroutine 1 [running, locked to thread]:
github.com/sirupsen/logrus.(*Entry).log(0xc00032e070, 0x0, {0xc0005a0570, 0x27})
    github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x487
github.com/sirupsen/logrus.(*Entry).Log(0xc00032e070, 0x0, {0xc00017dcf0?, 0x0?, 0x0?})
    github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(0xf19c9c?, {0xc00017dcf0?, 0xc0001ea170?, 0xc000591360?})
    github.com/sirupsen/logrus@v1.8.1/entry.go:331 +0x29
github.com/skycoin/skywire/cmd/skywire-cli/commands/log.glob..func1(0x1b579e0?, {0xedc437?, 0x2?, 0x2?})
    github.com/skycoin/skywire/cmd/skywire-cli/commands/log/root.go:108 +0x40c
github.com/spf13/cobra.(*Command).execute(0x1b579e0, {0xc000499c00, 0x2, 0x2})
    github.com/spf13/cobra@v1.4.0/command.go:860 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0x1b54a60)
    github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
    github.com/spf13/cobra@v1.4.0/command.go:902
github.com/skycoin/skywire/cmd/skywire-cli/commands.Execute()
    github.com/skycoin/skywire/cmd/skywire-cli/commands/root.go:224 +0x6a
main.main()
    github.com/skycoin/skywire/cmd/skywire-cli/skywire-cli.go:9 +0x17