google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.56k stars 1.28k forks source link

A normal container cannot connect to the container running on runsc. #2835

Closed ooyyloo closed 4 years ago

ooyyloo commented 4 years ago

Description

Hi~. I'm using gVisor. I encounter a problem. When I use VPS, the container which is running on runc cannot connect to the container running on runsc. "cannot connect" here means "ping" fails and "http request". However, when I use my own PC, it just works well. Do you have any ideas about this?

daemon.json:

{
    "runtimes": {
        "runsc": {
            "path": "/usr/local/bin/runsc",
            "runtimeArgs": [
                "--debug",
                "--debug-log=/tmp/runsc/",
                "--strace",
                "--log-packets"
            ]
        }
    }
}

Test: Sending HTTP request

I use tcpdump to capture packets of a container, which is running on runc. It captures ARP broadcast packets to a container (I get ARP broadcast packets captured in this container also) running on runsc, but no response for the ARP broadcast. The container running on runsc is built from image "ubuntu:18.04"; the container running on runc is built from image "node:12".

Steps to reproduce

Use SA2.SMALL2 of Tencent Cloud and docker-compose command to deploy my project. If the project is necessary to debug, please let me know.

Run ifconfig in runsc container I notice that there's no ip address of eth0 when I run ifconfig.

root@VM-0-13-ubuntu:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:ac:xx:xx:xx
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:33 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1722 (1.7 KB)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Environment

Please include the following details of your environment:

tanjianfeng commented 4 years ago

Can you share more about it? It failes on arp query, or tcp connection establishment, or http request?

Netstack is used here as the network stack according to your config. And your kernel version is 4.4.0. Netstack requires a newer version of kernel, >= 4.14.77+. IIUIC, that requirement is to make gso work.

hbhasker commented 4 years ago

Could you paste a tcpdump here and what you were expecting to see, maybe a trace of what you see under runc vs runsc would be helpful.

ooyyloo commented 4 years ago

With runc root@VM-0-13-ubuntu:~# tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:05:53.321230 ARP, Request who-has 172.21.0.2 tell 172.21.0.7, length 28 15:05:53.321253 ARP, Reply 172.21.0.2 is-at 02:42:ac:##:##:## (oui Unknown), length 28 15:05:53.321265 IP 172.21.0.7.60906 > 172.21.0.2.8221: Flags [S], seq 4271617950, win 29200, options [mss 1460,sackOK,TS val 63236005 ecr 0,nop,wscale 7], length 0 15:05:53.321280 IP 172.21.0.2.8221 > 172.21.0.7.60906: Flags [S.], seq 3965179949, ack 4271617951, win 28960, options [mss 1460,sackOK,TS val 63236005 ecr 63236005,nop,wscale 7], length 0 15:05:53.321293 IP 172.21.0.7.60906 > 172.21.0.2.8221: Flags [.], ack 1, win 229, options [nop,nop,TS val 63236005 ecr 63236005], length 0 15:05:53.321549 ARP, Request who-has 172.21.0.1 tell 172.21.0.2, length 28 15:05:53.321561 ARP, Reply 172.21.0.1 is-at 02:42:73:##:##:## (oui Unknown), length 28 15:05:53.326618 IP 172.21.0.2.46999 > 183.60.83.19.domain: 525+ PTR? 1.0.21.172.in-addr.arpa. (41) 15:05:53.329457 IP 183.60.83.19.domain > 172.21.0.2.46999: 525 NXDomain* 0/1/0 (100) 15:05:53.329517 IP 172.21.0.2.39536 > 183.60.83.19.domain: 39606+ PTR? 19.83.60.183.in-addr.arpa. (43) 15:05:53.331442 IP 183.60.83.19.domain > 172.21.0.2.39536: 39606 NXDomain 0/1/0 (101) 15:05:53.516189 IP 172.21.0.2.8221 > 172.21.0.7.60906: Flags [P.], seq 1:162, ack 286, win 235, options [nop,nop,TS val 63236054 ecr 63236006], length 161 15:05:53.516227 IP 172.21.0.7.60906 > 172.21.0.2.8221: Flags [.], ack 162, win 237, options [nop,nop,TS val 63236054 ecr 63236054], length 0 15:05:53.518282 IP 172.21.0.7.60906 > 172.21.0.2.8221: Flags [F.], seq 286, ack 162, win 237, options [nop,nop,TS val 63236055 ecr 63236054], length 0 15:05:53.519859 IP 172.21.0.2.8221 > 172.21.0.7.60906: Flags [F.], seq 162, ack 287, win 235, options [nop,nop,TS val 63236055 ecr 63236055], length 0 15:05:53.519879 IP 172.21.0.7.60906 > 172.21.0.2.8221: Flags [.], ack 163, win 237, options [nop,nop,TS val 63236055 ecr 63236055], length 0 15:05:58.338070 ARP, Request who-has 172.21.0.2 tell 172.21.0.1, length 28 15:05:58.338088 ARP, Reply 172.21.0.2 is-at 02:42:ac:##:##:## (oui Unknown), length 28

With runsc root@VM-0-13-ubuntu:~# tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:00:12.313791 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28 15:00:13.310082 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28 15:00:14.310079 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28 15:00:15.310124 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28 15:00:16.310083 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28 15:00:17.310080 ARP, Request who-has 172.20.0.2 tell 172.20.0.7, length 28

What I want to see What I want to see is the container running on runsc sends ARP response(reply). And in this situation, tcpdump should be able to capture the response packet.

ooyyloo commented 4 years ago

Updating the kernel to a newer version solves this problem. The kernel of version 4.4.0 seems to be older than the kernel of version 3.17.