google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.85k stars 1.3k forks source link

No obvious way to checkpoint a container when TCP sockets have been recently closed and are in TIME_WAIT state in the kernel #10788

Closed cweld510 closed 3 months ago

cweld510 commented 3 months ago

Description

If code running in a container has opened a socket and recently called close on it, then there is no clear way to checkpoint the container; any checkpoint attempts will likely fail because the socket is still in TIME_WAIT state in the kernel. As a result, it's not clear how to checkpoint containers when we've recently opened and closed TCP sockets.

I'm running more-or-less the following python code in the container:

import socket
import requests

class SnapshotClass:
    def __init__(self):
        self._socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    def lingering_open_sockets(self):
        remote_ip = "94.140.14.14"  # AdGuard DNS
        self._socket.connect((remote_ip, 80))
        self._socket.close()
        # checkpoint the container here   

s = SnapshotClass()
s.lingering_open_sockets();

Doing this results in the following error:

Failed creating function container snapshot: internal runtime error: failed to snapshot container => container_id=ta-01J5EAD3M1T0N41P2RA165X5A4 stderr=checkpoint failed: checkpointing container "ta-01J5EAD3M1T0N41P2RA165X5A4": encoding error: save rejected due to unsupported networking state: endpoint cannot be saved in connected state: local 172.20.0.86:30031, remote 94.140.14.14:80:

We can avoid the error if we wait 60 seconds before checkpoint, which is the default period of time that sockets spend in TIME_WAIT after closing; but that's a long time to wait. I've tried setting sysctl settings in the config.json given to runsc to ensure that the sockets reach the CLOSE state quickly (e.g. net.ipv4.tcp_fin_timeout: "1"), but those settings don't seem to be respected inside the container. (I can file this bug separately.) From outside the container, we can close sockets in the Linux kernel, but not in Netstack.

Can I ask for some guidance on how to deal with this situation properly?

Steps to reproduce

No response

runsc version

runsc version 40a09da5a1ab
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux 5.15.0-101.103.2.1.el9uek.x86_64 #2 SMP Tue May 2 01:10:45 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

EtiennePerot commented 3 months ago

Can you set --net-disconnect-ok=true to avoid the wait?

cweld510 commented 3 months ago

Yes, that fixes the issue! Thank you!