Closed ustiugov closed 5 years ago
/cc @amshinde @mcastelino I have run 1000 containers under docker with kata, but not doing any port mapping or particularly exercising the network connections.
@ustiugov Do you see any others errors in the logs besides that: https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#enable-full-debug
Can you run kata-collect-data.sh
on your machine?
Let me see if I can reproduce this.
I would be eager to help but installation started to fail... Issue#485. Waiting for a solution.
@ustiugov I saw that issue recently. Can you use kata-deploy in the meantime to install. The command you want to run is:
docker run -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
@ustiugov I tried reproducing this issue with launching 100 nginx containers in parallel and curling on the host port. I did not see any timeout issues. I am going to try this next with 1000 containers.
@ustiugov Just tested this with 500 containers. I was able to connect to them successfully. This is what I used:
for i in {1..100}; do sudo docker run -itd --name server$i -p $((HOST_PORT+i)):80 --runtime=kata-qemu nginx; done
for i in {1..500}; do curl 127.0.0.1:$((HOST_PORT+i)) ; done
I used a similar script for launching 500 containers in parallel as well.
Maybe you are hitting limits of your system when launching multiple containers. I am curious to see what your tcp application looks like as well.
I think that my app fails not when the clients open the connections but when they start to send packets thru. My tcp server accepts connections and then uses epoll_wait
to receive packets then spins in a tight loop for 100usec while processing each packet. ~12k connections uniformly distributed among ~48 tcp servers, each in its own kata container (QEMU-lite).
Also, it's not the timeouts, it's reset by peer
error.
@ustiugov Can you provide your application code, so that I can replicate your exact setup.
@ustiugov if possible could you also try with --userland-proxy=false
this will help reduce the number of components involved. Or maybe try to directly reach the IP and port of the container itself vs using port forwarding.
Thank you and sorry for the delay. I will try both options and get back to you.
After the Ubuntu repos breakdown was gone, I reinstalled kata with apt
. Now I am able to boot up to 384 micro VMs (kata containers) with tcp servers inside. However, this is unstable and in some experiments I see quite a lot of connections reset by peer
errors.
@ustiugov thanks for the details, and for helping push on Kata here.
Do you have more details of the workload you can share (container image, for example?)? I'd like to test it here, as it sounds like you are doing a great job stressing the system, and I'd like to 1) resolve and, 2) augment our testing infra
Hi @egernst, sure, apologies for the delay (had to check the university copyright, etc.). I am going to try to provide the sources and/or containers by the end of this week.
The version of kata-runtime that I am using now is 1.7.0. Just for the record.
[Meta]
Version = "1.0.23"
[Runtime]
Debug = false
Trace = false
DisableGuestSeccomp = true
DisableNewNetNs = false
Path = "/usr/bin/kata-runtime"
[Runtime.Version]
Semver = "1.7.0"
Commit = ""
OCI = "1.0.1-dev"
[Runtime.Config]
Path = "/usr/share/defaults/kata-containers/configuration.toml"
[Hypervisor]
MachineType = "pc"
Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
Path = "/usr/bin/qemu-lite-system-x86_64"
BlockDeviceDriver = "virtio-scsi"
EntropySource = "/dev/urandom"
Msize9p = 8192
MemorySlots = 10
Debug = false
UseVSock = false
SharedFS = "virtio-9p"
[Image]
Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.7.0_agent_43bd707543.img"
[Kernel]
Path = "/usr/share/kata-containers/vmlinuz-4.19.28.40-28.container"
Parameters = "init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.mask=systemd-journald.service systemd.mask=systemd-journald.socket systemd.mask=systemd-journal-flush.service systemd.mask=systemd-journald-dev-log.socket systemd.mask=systemd-udevd.service systemd.mask=systemd-udevd.socket systemd.mask=systemd-udev-trigger.service systemd.mask=systemd-udevd-kernel.socket systemd.mask=systemd-udevd-control.socket systemd.mask=systemd-timesyncd.service systemd.mask=systemd-update-utmp.service systemd.mask=systemd-tmpfiles-setup.service systemd.mask=systemd-tmpfiles-cleanup.service systemd.mask=systemd-tmpfiles-cleanup.timer systemd.mask=tmp.mount systemd.mask=systemd-random-seed.service systemd.mask=systemd-coredump@.service"
[Initrd]
Path = ""
[Proxy]
Type = "kataProxy"
Version = "kata-proxy version 1.7.0-ea2b0bb"
Path = "/usr/libexec/kata-containers/kata-proxy"
Debug = false
[Shim]
Type = "kataShim"
Version = "kata-shim version 1.7.0-7f2ab77"
Path = "/usr/libexec/kata-containers/kata-shim"
Debug = false
[Agent]
Type = "kata"
Debug = false
Trace = false
TraceMode = ""
TraceType = ""
[Host]
Kernel = "4.15.0-50-generic"
Architecture = "amd64"
VMContainerCapable = true
SupportVSocks = true
[Host.Distro]
Name = "Ubuntu"
Version = "18.04"
[Host.CPU]
Vendor = "GenuineIntel"
Model = "Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz"
[Netmon]
Version = "kata-netmon version 1.7.0"
Path = "/usr/libexec/kata-containers/kata-netmon"
Debug = false
Enable = false
Hi @egernst, I prepared a pre-release of a part of our code for testing purposes only. Please clone and follow the instructions in the README. I remain at your service if you have any problem running the code. Our framework aims at load testing and latency/throughput measurements. https://github.com/ustiugov/kata_load_test
Thanks @ustiugov - will try to take a look beginning of this week. Heads up @mcastelino @amshinde @bergwolf
Hi @ustiugov,
I tried to reproduce the issue using your testing framework and seems like I couldnt get the connections reset by peer
errors.
I could launch 512 kata-containers. What I could see is that that the latency sometimes goes too high.
Here I attach the log, when running the 512 kata-containers, please take a look and let us know if these results make sense to you:
[172.16.17.4] Executing task 'run_kata'
[172.16.17.4] run: /home/fuentess/kata_load_test/helper_scripts/run_docker_vm.sh 1 512 kata-runtime
[172.16.17.4] out: Running VM with runtime=kata-runtime, thread/vcpus_num=1, VM count=512
[172.16.17.4] out: net.ipv4.ip_local_port_range = 51000 65535
[172.16.17.4] out: net.ipv4.conf.all.forwarding = 1
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh1 = 1024
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh2 = 2048
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh3 = 4096
[172.16.17.4] out: e33df96c10f71e32e4ec52e5714ccf6265ee10f09ca44962c47a4b0223406533
[172.16.17.4] out: 2e70d53e03732d884b83a28183f85203cd6fdbdb42b728dad84d52c8308d45b5
...
[172.16.17.4] out: a93ad50166e06f5165a7a8b64670c0e12874b19761b85be75fc2b020e6ed06cb
[172.16.17.4] out: feb68c944d06ceb5e0dd1071492429680ea21361f68f1a5098bfb0781f32c2d2
[172.16.17.4] out: 39efeea477f649230890dddd75d432ff6b0797330a0f558a2c88efd880c5402d
[172.16.17.4] out: Guests are ready!
[172.16.17.4] out:
[172.16.17.6] Executing task 'run_lancet'
[172.16.17.6] run: cd /home/fuentess/kata_load_test/lancet && ./coordinator/coordinator -comProto TCP -loadThreads 8 -idist fixed --appProto synthetic:fixed:100 -loadAgents 172.16.17.6 -loadBinary agents/agent -
loadConn 12288 -loadPattern step:10000:70000:350000 -ltAgents 172.16.17.5 -ltBinary agents/agent -ltConn 12288 -lqps 2000 -targetHost 172.16.17.4:33000,172.16.17.4:33001,172.16.17.4:33002,172.16.17.4:33003,172.1
6.17.4:33004,172.16.17.4:33005,172.16.17.4:33006,172.16.17.4:33007,172.16.17.4:33008,172.16.17.4:33009,172.16.17.4:33010,172.16.17.4:33011,172.16.17.4:33012,172.16.17.4:33013,172.16.17.4:33014,172.16.17.4:33015,
172.16.17.4:33016,172.16.17.4:33017,172.16.17.4:33018,172.16.17.4:33019,172.16.17.4:33020,172.16.17.4:33021,172.16.17.4:33022,172.16.17.4:33023,172.16.17.4:33024,172.16.17.4:33025,172.16.17.4:33026,172.16.17.4:3
3027,172.16.17.4:33028,172.16.17.4:33029,172.16.17.4:33030,172.16.17.4:33031,172.16.17.4:33032,172.16.17.4:33033,172.16.17.4:33034,172.16.17.4:33035,172.16.17.4:33036,172.16.17.4:33037,172.16.17.4:33038,172.16.1
7.4:33039,172.16.17.4:33040,172.16.17.4:33041,172.16.17.4:33042,172.16.17.4:33043,172.16.17.4:33044,172.16.17.4:33045,172.16.17.4:33046,172.16.17.4:33047,172.16.17.4:33048,172.16.17.4:33049,172.16.17.4:33050,17$
.16.17.4:33051,172.16.17.4:33052,172.16.17.4:33053,172.16.17.4:33054,172.16.17.4:33055,172.16.17.4:33056,172.16.17.4:33057,172.16.17.4:33058,172.16.17.4:33059,172.16.17.4:33060,172.16.17.4:33061,172.16.17.4:330$
2,172.16.17.4:33063,172.16.17.4:33064,172.16.17.4:33065,172.16.17.4:33066,172.16.17.4:33067,172.16.17.4:33068,172.16.17.4:33069,172.16.17.4:33070,172.16.17.4:33071,172.16.17.4:33072,172.16.17.4:33073,172.16.17.$
:33074,172.16.17.4:33075,172.16.17.4:33076,172.16.17.4:33077,172.16.17.4:33078,172.16.17.4:33079,172.16.17.4:33080,172.16.17.4:33081,172.16.17.4:33082,172.16.17.4:33083,172.16.17.4:33084,172.16.17.4:33085,172.1$
.17.4:33086,172.16.17.4:33087,172.16.17.4:33088,172.16.17.4:33089,172.16.17.4:33090,172.16.17.4:33091,172.16.17.4:33092,172.16.17.4:33093,172.16.17.4:33094,172.16.17.4:33095,172.16.17.4:33096,172.16.17.4:33097,$
72.16.17.4:33098,172.16.17.4:33099,172.16.17.4:33100,172.16.17.4:33101,172.16.17.4:33102,172.16.17.4:33103,172.16.17.4:33104,172.16.17.4:33105,172.16.17.4:33106,172.16.17.4:33107,172.16.17.4:33108,172.16.17.4:3$
109,172.16.17.4:33110,172.16.17.4:33111,172.16.17.4:33112,172.16.17.4:33113,172.16.17.4:33114,172.16.17.4:33115,172.16.17.4:33116,172.16.17.4:33117,172.16.17.4:33118,172.16.17.4:33119,172.16.17.4:33120,172.16.1$
.4:33121,172.16.17.4:33122,172.16.17.4:33123,172.16.17.4:33124,172.16.17.4:33125,172.16.17.4:33126,172.16.17.4:33127,172.16.17.4:33128,172.16.17.4:33129,172.16.17.4:33130,172.16.17.4:33131,172.16.17.4:33132,172$
16.17.4:33133,172.16.17.4:33134,172.16.17.4:33135,172.16.17.4:33136,172.16.17.4:33137,172.16.17.4:33138,172.16.17.4:33139,172.16.17.4:33140,172.16.17.4:33141,172.16.17.4:33142,172.16.17.4:33143,172.16.17.4:3314$
,172.16.17.4:33145,172.16.17.4:33146,172.16.17.4:33147,172.16.17.4:33148,172.16.17.4:33149,172.16.17.4:33150,172.16.17.4:33151,172.16.17.4:33152,172.16.17.4:33153,172.16.17.4:33154,172.16.17.4:33155,172.16.17.4$
33156,172.16.17.4:33157,172.16.17.4:33158,172.16.17.4:33159,172.16.17.4:33160,172.16.17.4:33161,172.16.17.4:33162,172.16.17.4:33163,172.16.17.4:33164,172.16.17.4:33165,172.16.17.4:33166,172.16.17.4:33167,172.16$
17.4:33168,172.16.17.4:33169,172.16.17.4:33170,172.16.17.4:33171,172.16.17.4:33172,172.16.17.4:33173,172.16.17.4:33174,172.16.17.4:33175,172.16.17.4:33176,172.16.17.4:33177,172.16.17.4:33178,172.16.17.4:33179,1$
2.16.17.4:33180,172.16.17.4:33181,172.16.17.4:33182,172.16.17.4:33183,172.16.17.4:33184,172.16.17.4:33185,172.16.17.4:33186,172.16.17.4:33187,172.16.17.4:33188,172.16.17.4:33189,172.16.17.4:33190,172.16.17.4:33$
91,172.16.17.4:33192,172.16.17.4:33193,172.16.17.4:33194,172.16.17.4:33195,172.16.17.4:33196,172.16.17.4:33197,172.16.17.4:33198,172.16.17.4:33199,172.16.17.4:33200,172.16.17.4:33201,172.16.17.4:33202,172.16.17$
4:33203,172.16.17.4:33204,172.16.17.4:33205,172.16.17.4:33206,172.16.17.4:33207,172.16.17.4:33208,172.16.17.4:33209,172.16.17.4:33210,172.16.17.4:33211,172.16.17.4:33212,172.16.17.4:33213,172.16.17.4:33214,172.$
6.17.4:33215,172.16.17.4:33216,172.16.17.4:33217,172.16.17.4:33218,172.16.17.4:33219,172.16.17.4:33220,172.16.17.4:33221,172.16.17.4:33222,172.16.17.4:33223,172.16.17.4:33224,172.16.17.4:33225,172.16.17.4:33226$
172.16.17.4:33227,172.16.17.4:33228,172.16.17.4:33229,172.16.17.4:33230,172.16.17.4:33231,172.16.17.4:33232,172.16.17.4:33233,172.16.17.4:33234,172.16.17.4:33235,172.16.17.4:33236,172.16.17.4:33237,172.16.17.4:$
3238,172.16.17.4:33239,172.16.17.4:33240,172.16.17.4:33241,172.16.17.4:33242,172.16.17.4:33243,172.16.17.4:33244,172.16.17.4:33245,172.16.17.4:33246,172.16.17.4:33247,172.16.17.4:33248,172.16.17.4:33249,172.16.$
7.4:33250,172.16.17.4:33251,172.16.17.4:33252,172.16.17.4:33253,172.16.17.4:33254,172.16.17.4:33255,172.16.17.4:33256,172.16.17.4:33257,172.16.17.4:33258,172.16.17.4:33259,172.16.17.4:33260,172.16.17.4:33261,17$
.16.17.4:33262,172.16.17.4:33263,172.16.17.4:33264,172.16.17.4:33265,172.16.17.4:33266,172.16.17.4:33267,172.16.17.4:33268,172.16.17.4:33269,172.16.17.4:33270,172.16.17.4:33271,172.16.17.4:33272,172.16.17.4:332$
3,172.16.17.4:33274,172.16.17.4:33275,172.16.17.4:33276,172.16.17.4:33277,172.16.17.4:33278,172.16.17.4:33279,172.16.17.4:33280,172.16.17.4:33281,172.16.17.4:33282,172.16.17.4:33283,172.16.17.4:33284,172.16.17.$
:33285,172.16.17.4:33286,172.16.17.4:33287,172.16.17.4:33288,172.16.17.4:33289,172.16.17.4:33290,172.16.17.4:33291,172.16.17.4:33292,172.16.17.4:33293,172.16.17.4:33294,172.16.17.4:33295,172.16.17.4:33296,172.1$
.17.4:33297,172.16.17.4:33298,172.16.17.4:33299,172.16.17.4:33300,172.16.17.4:33301,172.16.17.4:33302,172.16.17.4:33303,172.16.17.4:33304,172.16.17.4:33305,172.16.17.4:33306,172.16.17.4:33307,172.16.17.4:33308,$
72.16.17.4:33309,172.16.17.4:33310,172.16.17.4:33311,172.16.17.4:33312,172.16.17.4:33313,172.16.17.4:33314,172.16.17.4:33315,172.16.17.4:33316,172.16.17.4:33317,172.16.17.4:33318,172.16.17.4:33319,172.16.17.4:3$
320,172.16.17.4:33321,172.16.17.4:33322,172.16.17.4:33323,172.16.17.4:33324,172.16.17.4:33325,172.16.17.4:33326,172.16.17.4:33327,172.16.17.4:33328,172.16.17.4:33329,172.16.17.4:33330,172.16.17.4:33331,172.16.1$
.4:33332,172.16.17.4:33333,172.16.17.4:33334,172.16.17.4:33335,172.16.17.4:33336,172.16.17.4:33337,172.16.17.4:33338,172.16.17.4:33339,172.16.17.4:33340,172.16.17.4:33341,172.16.17.4:33342,172.16.17.4:33343,172$
16.17.4:33344,172.16.17.4:33345,172.16.17.4:33346,172.16.17.4:33347,172.16.17.4:33348,172.16.17.4:33349,172.16.17.4:33350,172.16.17.4:33351,172.16.17.4:33352,172.16.17.4:33353,172.16.17.4:33354,172.16.17.4:3335$
,172.16.17.4:33356,172.16.17.4:33357,172.16.17.4:33358,172.16.17.4:33359,172.16.17.4:33360,172.16.17.4:33361,172.16.17.4:33362,172.16.17.4:33363,172.16.17.4:33364,172.16.17.4:33365,172.16.17.4:33366,172.16.17.4$
33367,172.16.17.4:33368,172.16.17.4:33369,172.16.17.4:33370,172.16.17.4:33371,172.16.17.4:33372,172.16.17.4:33373,172.16.17.4:33374,172.16.17.4:33375,172.16.17.4:33376,172.16.17.4:33377,172.16.17.4:33378,172.16$
17.4:33379,172.16.17.4:33380,172.16.17.4:33381,172.16.17.4:33382,172.16.17.4:33383,172.16.17.4:33384,172.16.17.4:33385,172.16.17.4:33386,172.16.17.4:33387,172.16.17.4:33388,172.16.17.4:33389,172.16.17.4:33390,1$
2.16.17.4:33391,172.16.17.4:33392,172.16.17.4:33393,172.16.17.4:33394,172.16.17.4:33395,172.16.17.4:33396,172.16.17.4:33397,172.16.17.4:33398,172.16.17.4:33399,172.16.17.4:33400,172.16.17.4:33401,172.16.17.4:33$
02,172.16.17.4:33403,172.16.17.4:33404,172.16.17.4:33405,172.16.17.4:33406,172.16.17.4:33407,172.16.17.4:33408,172.16.17.4:33409,172.16.17.4:33410,172.16.17.4:33411,172.16.17.4:33412,172.16.17.4:33413,172.16.17$
4:33414,172.16.17.4:33415,172.16.17.4:33416,172.16.17.4:33417,172.16.17.4:33418,172.16.17.4:33419,172.16.17.4:33420,172.16.17.4:33421,172.16.17.4:33422,172.16.17.4:33423,172.16.17.4:33424,172.16.17.4:33425,172.$
6.17.4:33426,172.16.17.4:33427,172.16.17.4:33428,172.16.17.4:33429,172.16.17.4:33430,172.16.17.4:33431,172.16.17.4:33432,172.16.17.4:33433,172.16.17.4:33434,172.16.17.4:33435,172.16.17.4:33436,172.16.17.4:33437$
172.16.17.4:33438,172.16.17.4:33439,172.16.17.4:33440,172.16.17.4:33441,172.16.17.4:33442,172.16.17.4:33443,172.16.17.4:33444,172.16.17.4:33445,172.16.17.4:33446,172.16.17.4:33447,172.16.17.4:33448,172.16.17.4:$
3449,172.16.17.4:33450,172.16.17.4:33451,172.16.17.4:33452,172.16.17.4:33453,172.16.17.4:33454,172.16.17.4:33455,172.16.17.4:33456,172.16.17.4:33457,172.16.17.4:33458,172.16.17.4:33459,172.16.17.4:33460,172.16.$
7.4:33461,172.16.17.4:33462,172.16.17.4:33463,172.16.17.4:33464,172.16.17.4:33465,172.16.17.4:33466,172.16.17.4:33467,172.16.17.4:33468,172.16.17.4:33469,172.16.17.4:33470,172.16.17.4:33471,172.16.17.4:33472,17$
.16.17.4:33473,172.16.17.4:33474,172.16.17.4:33475,172.16.17.4:33476,172.16.17.4:33477,172.16.17.4:33478,172.16.17.4:33479,172.16.17.4:33480,172.16.17.4:33481,172.16.17.4:33482,172.16.17.4:33483,172.16.17.4:334$
4,172.16.17.4:33485,172.16.17.4:33486,172.16.17.4:33487,172.16.17.4:33488,172.16.17.4:33489,172.16.17.4:33490,172.16.17.4:33491,172.16.17.4:33492,172.16.17.4:33493,172.16.17.4:33494,172.16.17.4:33495,172.16.17.$
:33496,172.16.17.4:33497,172.16.17.4:33498,172.16.17.4:33499,172.16.17.4:33500,172.16.17.4:33501,172.16.17.4:33502,172.16.17.4:33503,172.16.17.4:33504,172.16.17.4:33505,172.16.17.4:33506,172.16.17.4:33507,172.1$
.17.4:33508,172.16.17.4:33509,172.16.17.4:33510,172.16.17.4:33511
[172.16.17.6] out: [kata-load-client-2] [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: Userspace timestamping
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount QPS RxBw TxBw
[172.16.17.6] out: 30005 6000.097585323168 48000.78068258534 47999.18092318915
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat 50th 90th 95th 99th
[172.16.17.6] out: 701.762 650.906(645.506, 655.805) 853.107(842.807, 863.308) 969.408(946.709, 996.408) 2143.319(1810.016, 2986.926)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount QPS RxBw TxBw
[172.16.17.6] out: 202172 40426.97760691137 323415.820855291 323049.48811398225
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat 50th 90th 95th 99th
[172.16.17.6] out: 10517.393 3810.334(3705.132, 3961.935) 18914.767(15925.941, 22009.194) 41921.67(32967.192, 48465.528) 127272.725(105954.436, 229603.729)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount QPS RxBw TxBw
[172.16.17.6] out: 375268 75016.07695830545 600128.6156664436 600621.1692911206
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat 50th 90th 95th 99th
[172.16.17.6] out: 53482.095 13369.618(8442.974, 19440.272) 169534.298(143303.166, 230297.935) 256155.364(210880.464, 330778.924) 417231.888(338917.595, 0)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount QPS RxBw TxBw
[172.16.17.6] out: 501684 100318.42166515095 802547.3733212076 806325.8810985828
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat 50th 90th 95th 99th
[172.16.17.6] out: 122268.85 81908.223(58017.312, 110558.078) 297171.927(254998.353, 384267.796) 377823.539(315642.79, 761939.434) 523363.025(427672.68, 0)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount QPS RxBw TxBw
[172.16.17.6] out: 629665 125906.5092704495 1.007252074163596e+06 992549.5675709831
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat 50th 90th 95th 99th
[172.16.17.6] out: 124278.882 99893.283(76363.775, 115895.624) 260931.306(223331.174, 413681.656) 356536.051(272099.205, 652250.564) 613830.925(435350.648, 0)
[172.16.17.6] out:
[172.16.17.6] out:
Done.
Disconnecting from 172.16.17.6... done.
[172.16.17.4] Executing task 'server_cleanup'
@chavafg It looks like Kata v1.8+ is much more robust and the issue is mitigated. Regarding the latency, we are investigating the performance differences/problems of microVMs as a part of my research project. I am going to keep you posted.
I think this issue can be closed now.
BTW, we've open-sourced the official version of Lancet, the latency/throughput measurement tool that I provided for load testing, under MIT license, and my colleague Marios Cogias presented this work at ATC'2019. The paper is available here.
cool, closing this issue.
Dear Developers,
I intend to study CPU overcommitment implications for the host and guest OS schedulers. Hence, I would like to boot ~100-1000 VMs on a single Linux host (similar to Amazon's Firecracker demo). While I am able to do that with both Firecracker and gVisor containers (at least up to 400 VMs/host), I experience connectivity problem when booting >32 VMs with Kata (connections to my TCP server get "reset by peer").
The way I configure networking is using docker with port forwarding:
The issue is weird/unstable because when I configure my client to set up connections with only one of the many (say, 96) booted VMs, each VM running my simplistic TCP server app, there is no problem: packets are sent and responses arrive. However, when I configure the client to distribute connections among all 96 virtualized servers, connections get
reset by peer
response when the clients start to to send packets on these connections in a round-robin fashion.I would welcome any suggestions how to further troubleshoot/avoid/fix this problem.
Regards, Dmitrii