google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.56k stars 1.28k forks source link

Extend -net-disconnect-ok capability to unix domain sockets #10897

Open cweld510 opened 1 week ago

cweld510 commented 1 week ago

Description

We're running into problems with checkpointing containers that have connections on a host unix domain socket mounted to the container; it results in the following error (expected since SCMConnectedEndpoint objects aren't saveable):

encoding error: runtime error: invalid memory address or nil pointer dereference:
goroutine 109 [running]:
gvisor.dev/gvisor/pkg/state.safely.func1()
    pkg/state/state.go:309 +0x179
panic({0x1179260?, 0x34acff0?})
    GOROOT/src/runtime/panic.go:770 +0x132
gvisor.dev/gvisor/pkg/sentry/socket/unix/transport.(*SCMConnectedEndpoint).StateTypeName(0x34b11b0?)
    <autogenerated>:1 +0x9
gvisor.dev/gvisor/pkg/state.lookupNameFields({0x15ca848, 0x11d3880})
    pkg/state/types.go:119 +0xbc
gvisor.dev/gvisor/pkg/state.(*typeEncodeDatabase).Lookup(0xc0007d51a8, {0x15ca848, 0x11d3880})
    pkg/state/types.go:135 +0x4c
gvisor.dev/gvisor/pkg/state.(*encodeState).findType(0xc0007d5188, {0x15ca848, 0x11d3880})
    pkg/state/encode.go:559 +0x45
gvisor.dev/gvisor/pkg/state.(*encodeState).findType(0xc0007d5188, {0x15ca848, 0x130a960})

We rely on the -net-disconnect-ok flag when checkpointing containers in production to close any TCP connections open at the time of the checkpoint rather than having the checkpoint attempt fail. This is fairly critical for us because we're running arbitrary user code and it's hard to guarantee that there are no open connections at the time we checkpoint.

If possible, we'd like for this flag (or a new flag) to apply to open unix domain sockets that are backed by host FDs. We mount (on the container) some host domain sockets for IPC between in-sandbox processes and our agent code running on the host, and in practice, we can't guarantee that the sockets are closed in gvisor at the time we try to checkpoint the container. This prevents us from successfully checkpointing certain workloads for some customers. The only way around this that I can think of is to have gvisor close the socket itself. It seems like there is precedent for this because gvisor already can do this for TCP connections.

I'm happy to attempt this myself if needed.

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

kevinGC commented 1 week ago

I think this would be reasonable to bundle with --net-disconnect-ok if you want to take a shot at it. The need / use case for it seems more or less the same.