Closed AkihiroSuda closed 1 week ago
I'm not sure if this is related, and it is from a build from last week and not 1.0.0, but I just saw this crash (when shutting down Lima):
{"level":"info","msg":"Shutting down QEMU with the power button","time":"2024-11-06T18:49:38-07:00"}
{"level":"info","msg":"Sending QMP system_powerdown command","time":"2024-11-06T18:49:38-07:00"}
{"level":"error","msg":"could not open tcp tunnel for id: tcp-127.0.0.1:6443-127.0.0.1:52705 error:rpc error: code = Canceled desc = context canceled","time":"2024-11-06T18:49:41-07:00"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x30 pc=0x102d45500]
goroutine 1077 [running]:
github.com/lima-vm/lima/pkg/portfwd.GrpcClientRW.Read({{0x140016fd9e0, 0x22}, {0x140008dab94, 0x9}, {0x0, 0x0}}, {0x14000aa0000, 0x8000, 0x0?})
/Users/runner/work/rancher-desktop-lima/rancher-desktop-lima/lima/pkg/portfwd/client.go:134 +0x40
io.copyBuffer({0x10329e0c0, 0x1400049c4b8}, {0x10329f6c0, 0x140005ff260}, {0x0, 0x0, 0x0})
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/io/io.go:429 +0x18c
io.Copy(...)
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/io/io.go:388
net.genericReadFrom(0x14000984b00?, {0x10329f6c0, 0x140005ff260})
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/net/net.go:762 +0x58
net.(*TCPConn).readFrom(0x1400049c470, {0x10329f6c0, 0x140005ff260})
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/net/tcpsock_posix.go:54 +0x50
net.(*TCPConn).ReadFrom(0x1400049c470, {0x10329f6c0?, 0x140005ff260?})
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/net/tcpsock.go:165 +0x30
io.copyBuffer({0x12b650ef0, 0x1400049c470}, {0x10329f6c0, 0x140005ff260}, {0x0, 0x0, 0x0})
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/io/io.go:415 +0x14c
io.Copy(...)
/Users/runner/hostedtoolcache/go/1.23.2/x64/src/io/io.go:388
github.com/lima-vm/lima/pkg/portfwd.HandleTCPConnection.func2()
/Users/runner/work/rancher-desktop-lima/rancher-desktop-lima/lima/pkg/portfwd/client.go:37 +0x60
golang.org/x/sync/errgroup.(*Group).Go.func1()
/Users/runner/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78 +0x54
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1075
/Users/runner/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:75 +0x98
Saw another panic
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x10bf5c444]
goroutine 1459 [running]:
github.com/lima-vm/lima/pkg/portfwd.HandleTCPConnection({0x10c4c8780, 0xc0006d6d20}, 0xc0006162b0, {0x0, 0x0}, {0xc0005a4090, 0xa})
/private/tmp/lima-20241106-4977-k63hh8/lima-1.0.0/pkg/portfwd/client.go:17 +0x64
created by github.com/lima-vm/lima/pkg/portfwd.(*ClosableListeners).forwardTCP in goroutine 1401
/private/tmp/lima-20241106-4977-k63hh8/lima-1.0.0/pkg/portfwd/listener.go:90 +0x48
@AkihiroSuda This panic occurred during stop or while running itself ??
Panicked while running itself, when playing around with docker run -p 8080:80 nginx:alpine
with various ports
Looks like -p 80:80
is enough to trigger the panic in portfwd.HandleTCPConnection
.
Probably the panic in portfwd.GrpcClientRW.Read
reported by @jandubois is irrelevant.
I also see the following errors earlier in ha.stderr.log
(sometimes a lot more); not sure if they are expected:
{"level":"debug","msg":"error in tcp tunnel for id: tcp-127.0.0.1:6443-127.0.0.1:52484 error:readfrom tcp 127.0.0.1:6443-\u003e127.0.0.1:52484: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2024-11-06T18:49:35-07:00"}
{"level":"debug","msg":"error in tcp tunnel for id: tcp-127.0.0.1:6443-127.0.0.1:52521 error:readfrom tcp 127.0.0.1:6443-\u003e127.0.0.1:52521: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2024-11-06T18:49:35-07:00"}
{"level":"debug","msg":"error in tcp tunnel for id: tcp-127.0.0.1:6443-127.0.0.1:52554 error:readfrom tcp 127.0.0.1:6443-\u003e127.0.0.1:52554: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2024-11-06T18:49:35-07:00"}
{"level":"debug","msg":"error in tcp tunnel for id: tcp-127.0.0.1:6443-127.0.0.1:52555 error:readfrom tcp 127.0.0.1:6443-\u003e127.0.0.1:52555: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2024-11-06T18:49:35-07:00"}
{"level":"debug","msg":"error in tcp tunnel for id: tcp-[::1]:80-[::1]:52697 error:readfrom tcp [::1]:80-\u003e[::1]:52697: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2024-11-06T18:49:35-07:00"}
Note that I'm not using an Ubuntu image, but Alpine. But it is a complicated test case, using k3s with spinkube operator running a wasm test... Trying to narrow it down.
It does however fail every single time (at least on aarch64, still trying to repro on Intel), so would be a good test to validate a fix...
Probably the panic in
portfwd.GrpcClientRW.Read
reported by @jandubois is irrelevant.
Should I create a separate issue for this?
BTW, I can also repro this on Intel; crash happens on every shutdown.
Yes, probably it should be a separate issue
BTW if we can't fix the issues immediately we should probably switch back the default to SSH forwarder and release v1.0.1 right now.
I think it is unrelated, but this looks suspicious: https://github.com/lima-vm/lima/blob/6cd896f0d6196f75ef2af39d25e73f42284b78af/pkg/portfwd/client.go#L21-L24
We continue on, even when stream
is not set, and later dereference it as g.stream.Recv
or g.stream.Send
.
Should there be a return
after logging the error?
Yes, probably it should be a separate issue
I created #2860 to fix it.
We are seeing 3 things with Lima 1.0.0 with docker, probably two are the same thing, so I'll mention them in this issue:
That PR seems to have solved it, and we have good runs now, no more hangs.
The third thing is not yet reproducible, but it's where the lima instance just stops of its own accord. limactl list
shows it as stopped.
Looks like
-p 80:80
is enough to trigger the panic inportfwd.HandleTCPConnection
.
Can you post exact repro steps; I can't repro this with 1.0.0
Looks like
-p 80:80
is enough to trigger the panic inportfwd.HandleTCPConnection
.Can you post exact repro steps; I can't repro this with 1.0.0
limactl start template://docker
limactl shell docker docker run -d --name nginx -p 80:80 nginx:alpine
curl localhost
cat ~/.lima/docker/ha.stderr.log
(macOS 15.1, Intel, vz, current master 4d56dab41bf9d0ad6bc3288cd49b818627e378b7)
This panic is likely causing "1. Host-bound ports suddenly stop being bound."
The gRPC port forwarder introduced in Lima v1.0 seems to have a stability issue with
template://docker
:https://github.com/lima-vm/lima/actions/runs/11638179464/job/32412581211
LIMA_SSH_PORT_FORWARDER=true
may work as a workaround.