google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.85k stars 1.3k forks source link

Internal Unix Domain Sockets are slow #5132

Closed majek closed 3 years ago

majek commented 3 years ago

Here's a chart of throughput of various SOCK_STREAM setups. The idea is to run an echo client who sends bunch of data, and then awaits the response, and measures the latency. Pretty simple stuff.

numbers-40-4096

This chart shows ipv4, ipv6, and unix domain sockets performance in two setups:

What stands out is that when both client and server of unix domain sockets are in gvisor the perfomance is 10x worse than any of the other options.

To reproduce see https://gist.github.com/majek/a2cea2d116fa04aaf167f45f357b4311.

git clone https://gist.github.com/majek/a2cea2d116fa04aaf167f45f357b4311 echo
make -C echo
./echo/test_perf.sh
hbhasker commented 3 years ago

I will try and spend time on this tomorrow but fixes may land only after holidays unless someone else picks it up since I am OOO next couple of weeks.

hbhasker commented 3 years ago

Just an update I am able to reproduce your results locally. eg sending 40 blocks of 4MB each over a UDS shows me the following

[+] Wrote 40 bursts of 4.0MiB in 1494.7ms [-] edge side EOF [+] Read 160.0MiB in 2495.9ms

vs ipv4 shows the following

[+] Wrote 40 bursts of 4.0MiB in 654.7ms [-] edge side EOF [+] Read 160.0MiB in 1656.0ms

hbhasker commented 3 years ago

So the issue turns out we never bothered to implement setsockopt SO_RCVBUF/SO_SNDBUF for unix domain sockets. The sockets are defaulting to a 16KB buffer size.

You can see this if you strace the client its doing read/writes of 16KB each in ping pong fashion

See: https://github.com/google/gvisor/blob/1375a87a209ef1a2523ada84254e3a0101afb4f5/pkg/sentry/socket/unix/transport/unix.go#L30

https://github.com/google/gvisor/blob/1375a87a209ef1a2523ada84254e3a0101afb4f5/pkg/sentry/socket/unix/transport/unix.go#L852

Once I changed the initialLimit to 4MB for testing these are the numbers I see [+] Wrote 40 bursts of 4.0MiB in 332.8ms [-] edge side EOF [+] Read 160.0MiB in 1333.1ms

On native (--runtime=runc) here's what I get

[-] edge side EOF [+] Wrote 40 bursts of 4.0MiB in 107.2ms [+] Read 160.0MiB in 1107.5ms

Still slower than native but not terrible anymore.

We will prioritize adding Setsockopt for SO_SND/RCV buf for UDS.

hbhasker commented 3 years ago

@nybidari is going to implement these setsockopts.

lubinszARM commented 3 years ago

/cc

iangudger commented 3 years ago

@nybidari If I recall correctly, the queue size enforcement code may need some work.

hbhasker commented 3 years ago

I will pick this up.

hbhasker commented 3 years ago

@majek Could you rerun your tests @ HEAD and see if the issue is now resolved?

majek commented 3 years ago

Looks good image

The UDS numbers seem in the same ballpark as v4/v6 netstack emulation which is what I would have expected. (notice: colors are wrong in the chart above) Thanks!

hbhasker commented 3 years ago

Marking this closed.