kata-containers / documentation

Kata Containers version 1.x documentation (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
500 stars 304 forks source link

Connections reset by peer while running >32 VMs per host #467

Closed ustiugov closed 5 years ago

ustiugov commented 5 years ago

Dear Developers,

I intend to study CPU overcommitment implications for the host and guest OS schedulers. Hence, I would like to boot ~100-1000 VMs on a single Linux host (similar to Amazon's Firecracker demo). While I am able to do that with both Firecracker and gVisor containers (at least up to 400 VMs/host), I experience connectivity problem when booting >32 VMs with Kata (connections to my TCP server get "reset by peer").

The way I configure networking is using docker with port forwarding:

docker run -dit --name NAME --runtime=kata-runtime -p HOST_PORT:GUEST_PORT alpine /path/to/my_TCP_server

The issue is weird/unstable because when I configure my client to set up connections with only one of the many (say, 96) booted VMs, each VM running my simplistic TCP server app, there is no problem: packets are sent and responses arrive. However, when I configure the client to distribute connections among all 96 virtualized servers, connections get reset by peer response when the clients start to to send packets on these connections in a round-robin fashion.

I would welcome any suggestions how to further troubleshoot/avoid/fix this problem.

Regards, Dmitrii

grahamwhaley commented 5 years ago

/cc @amshinde @mcastelino I have run 1000 containers under docker with kata, but not doing any port mapping or particularly exercising the network connections.

amshinde commented 5 years ago

@ustiugov Do you see any others errors in the logs besides that: https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#enable-full-debug

Can you run kata-collect-data.sh on your machine? Let me see if I can reproduce this.

ustiugov commented 5 years ago

I would be eager to help but installation started to fail... Issue#485. Waiting for a solution.

amshinde commented 5 years ago

@ustiugov I saw that issue recently. Can you use kata-deploy in the meantime to install. The command you want to run is:

docker run -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
amshinde commented 5 years ago

@ustiugov I tried reproducing this issue with launching 100 nginx containers in parallel and curling on the host port. I did not see any timeout issues. I am going to try this next with 1000 containers.

amshinde commented 5 years ago

@ustiugov Just tested this with 500 containers. I was able to connect to them successfully. This is what I used:

for i in {1..100}; do sudo docker run -itd --name server$i -p $((HOST_PORT+i)):80  --runtime=kata-qemu nginx; done
for i in {1..500}; do curl 127.0.0.1:$((HOST_PORT+i)) ; done

I used a similar script for launching 500 containers in parallel as well.

Maybe you are hitting limits of your system when launching multiple containers. I am curious to see what your tcp application looks like as well.

ustiugov commented 5 years ago

I think that my app fails not when the clients open the connections but when they start to send packets thru. My tcp server accepts connections and then uses epoll_wait to receive packets then spins in a tight loop for 100usec while processing each packet. ~12k connections uniformly distributed among ~48 tcp servers, each in its own kata container (QEMU-lite).

Also, it's not the timeouts, it's reset by peer error.

amshinde commented 5 years ago

@ustiugov Can you provide your application code, so that I can replicate your exact setup.

mcastelino commented 5 years ago

@ustiugov if possible could you also try with --userland-proxy=false this will help reduce the number of components involved. Or maybe try to directly reach the IP and port of the container itself vs using port forwarding.

ustiugov commented 5 years ago

Thank you and sorry for the delay. I will try both options and get back to you.

ustiugov commented 5 years ago

After the Ubuntu repos breakdown was gone, I reinstalled kata with apt. Now I am able to boot up to 384 micro VMs (kata containers) with tcp servers inside. However, this is unstable and in some experiments I see quite a lot of connections reset by peer errors.

egernst commented 5 years ago

@ustiugov thanks for the details, and for helping push on Kata here.

Do you have more details of the workload you can share (container image, for example?)? I'd like to test it here, as it sounds like you are doing a great job stressing the system, and I'd like to 1) resolve and, 2) augment our testing infra

ustiugov commented 5 years ago

Hi @egernst, sure, apologies for the delay (had to check the university copyright, etc.). I am going to try to provide the sources and/or containers by the end of this week.

ustiugov commented 5 years ago

The version of kata-runtime that I am using now is 1.7.0. Just for the record.

[Meta]
  Version = "1.0.23"

[Runtime]
  Debug = false
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  Path = "/usr/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.7.0"
    Commit = ""
    OCI = "1.0.1-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  Msize9p = 8192
  MemorySlots = 10
  Debug = false
  UseVSock = false
  SharedFS = "virtio-9p"

[Image]
  Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.7.0_agent_43bd707543.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.19.28.40-28.container"
  Parameters = "init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.mask=systemd-journald.service systemd.mask=systemd-journald.socket systemd.mask=systemd-journal-flush.service systemd.mask=systemd-journald-dev-log.socket systemd.mask=systemd-udevd.service systemd.mask=systemd-udevd.socket systemd.mask=systemd-udev-trigger.service systemd.mask=systemd-udevd-kernel.socket systemd.mask=systemd-udevd-control.socket systemd.mask=systemd-timesyncd.service systemd.mask=systemd-update-utmp.service systemd.mask=systemd-tmpfiles-setup.service systemd.mask=systemd-tmpfiles-cleanup.service systemd.mask=systemd-tmpfiles-cleanup.timer systemd.mask=tmp.mount systemd.mask=systemd-random-seed.service systemd.mask=systemd-coredump@.service"

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.7.0-ea2b0bb"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.7.0-7f2ab77"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"
  Debug = false
  Trace = false
  TraceMode = ""
  TraceType = ""

[Host]
  Kernel = "4.15.0-50-generic"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz"

[Netmon]
  Version = "kata-netmon version 1.7.0"
  Path = "/usr/libexec/kata-containers/kata-netmon"
  Debug = false
  Enable = false
ustiugov commented 5 years ago

Hi @egernst, I prepared a pre-release of a part of our code for testing purposes only. Please clone and follow the instructions in the README. I remain at your service if you have any problem running the code. Our framework aims at load testing and latency/throughput measurements. https://github.com/ustiugov/kata_load_test

egernst commented 5 years ago

Thanks @ustiugov - will try to take a look beginning of this week. Heads up @mcastelino @amshinde @bergwolf

chavafg commented 5 years ago

Hi @ustiugov,

I tried to reproduce the issue using your testing framework and seems like I couldnt get the connections reset by peer errors. I could launch 512 kata-containers. What I could see is that that the latency sometimes goes too high. Here I attach the log, when running the 512 kata-containers, please take a look and let us know if these results make sense to you:

[172.16.17.4] Executing task 'run_kata'                                                                                                                                                                            
[172.16.17.4] run: /home/fuentess/kata_load_test/helper_scripts/run_docker_vm.sh 1 512 kata-runtime                                                                                                                
[172.16.17.4] out: Running VM with runtime=kata-runtime, thread/vcpus_num=1, VM count=512                                                                                                                          
[172.16.17.4] out: net.ipv4.ip_local_port_range = 51000 65535                                                                                                                                                      
[172.16.17.4] out: net.ipv4.conf.all.forwarding = 1                                                                                                                                                                
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh1 = 1024                                                                                                                                                        
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh2 = 2048                                                                                                                                                        
[172.16.17.4] out: net.ipv4.neigh.default.gc_thresh3 = 4096                                                                                                                                                        
[172.16.17.4] out: e33df96c10f71e32e4ec52e5714ccf6265ee10f09ca44962c47a4b0223406533                                                                                                                                
[172.16.17.4] out: 2e70d53e03732d884b83a28183f85203cd6fdbdb42b728dad84d52c8308d45b5
...
[172.16.17.4] out: a93ad50166e06f5165a7a8b64670c0e12874b19761b85be75fc2b020e6ed06cb                                                                                                                                
[172.16.17.4] out: feb68c944d06ceb5e0dd1071492429680ea21361f68f1a5098bfb0781f32c2d2                                                                                                                                
[172.16.17.4] out: 39efeea477f649230890dddd75d432ff6b0797330a0f558a2c88efd880c5402d                                                                                                                                
[172.16.17.4] out: Guests are ready!                                                                                                                                                                               
[172.16.17.4] out:                                                                                                                                                                                                 

[172.16.17.6] Executing task 'run_lancet'
[172.16.17.6] run: cd /home/fuentess/kata_load_test/lancet && ./coordinator/coordinator -comProto TCP -loadThreads 8 -idist fixed --appProto synthetic:fixed:100 -loadAgents 172.16.17.6 -loadBinary agents/agent -
loadConn 12288 -loadPattern step:10000:70000:350000 -ltAgents 172.16.17.5 -ltBinary agents/agent -ltConn 12288 -lqps 2000 -targetHost 172.16.17.4:33000,172.16.17.4:33001,172.16.17.4:33002,172.16.17.4:33003,172.1
6.17.4:33004,172.16.17.4:33005,172.16.17.4:33006,172.16.17.4:33007,172.16.17.4:33008,172.16.17.4:33009,172.16.17.4:33010,172.16.17.4:33011,172.16.17.4:33012,172.16.17.4:33013,172.16.17.4:33014,172.16.17.4:33015,
172.16.17.4:33016,172.16.17.4:33017,172.16.17.4:33018,172.16.17.4:33019,172.16.17.4:33020,172.16.17.4:33021,172.16.17.4:33022,172.16.17.4:33023,172.16.17.4:33024,172.16.17.4:33025,172.16.17.4:33026,172.16.17.4:3
3027,172.16.17.4:33028,172.16.17.4:33029,172.16.17.4:33030,172.16.17.4:33031,172.16.17.4:33032,172.16.17.4:33033,172.16.17.4:33034,172.16.17.4:33035,172.16.17.4:33036,172.16.17.4:33037,172.16.17.4:33038,172.16.1
7.4:33039,172.16.17.4:33040,172.16.17.4:33041,172.16.17.4:33042,172.16.17.4:33043,172.16.17.4:33044,172.16.17.4:33045,172.16.17.4:33046,172.16.17.4:33047,172.16.17.4:33048,172.16.17.4:33049,172.16.17.4:33050,17$
.16.17.4:33051,172.16.17.4:33052,172.16.17.4:33053,172.16.17.4:33054,172.16.17.4:33055,172.16.17.4:33056,172.16.17.4:33057,172.16.17.4:33058,172.16.17.4:33059,172.16.17.4:33060,172.16.17.4:33061,172.16.17.4:330$
2,172.16.17.4:33063,172.16.17.4:33064,172.16.17.4:33065,172.16.17.4:33066,172.16.17.4:33067,172.16.17.4:33068,172.16.17.4:33069,172.16.17.4:33070,172.16.17.4:33071,172.16.17.4:33072,172.16.17.4:33073,172.16.17.$
:33074,172.16.17.4:33075,172.16.17.4:33076,172.16.17.4:33077,172.16.17.4:33078,172.16.17.4:33079,172.16.17.4:33080,172.16.17.4:33081,172.16.17.4:33082,172.16.17.4:33083,172.16.17.4:33084,172.16.17.4:33085,172.1$
.17.4:33086,172.16.17.4:33087,172.16.17.4:33088,172.16.17.4:33089,172.16.17.4:33090,172.16.17.4:33091,172.16.17.4:33092,172.16.17.4:33093,172.16.17.4:33094,172.16.17.4:33095,172.16.17.4:33096,172.16.17.4:33097,$
72.16.17.4:33098,172.16.17.4:33099,172.16.17.4:33100,172.16.17.4:33101,172.16.17.4:33102,172.16.17.4:33103,172.16.17.4:33104,172.16.17.4:33105,172.16.17.4:33106,172.16.17.4:33107,172.16.17.4:33108,172.16.17.4:3$
109,172.16.17.4:33110,172.16.17.4:33111,172.16.17.4:33112,172.16.17.4:33113,172.16.17.4:33114,172.16.17.4:33115,172.16.17.4:33116,172.16.17.4:33117,172.16.17.4:33118,172.16.17.4:33119,172.16.17.4:33120,172.16.1$
.4:33121,172.16.17.4:33122,172.16.17.4:33123,172.16.17.4:33124,172.16.17.4:33125,172.16.17.4:33126,172.16.17.4:33127,172.16.17.4:33128,172.16.17.4:33129,172.16.17.4:33130,172.16.17.4:33131,172.16.17.4:33132,172$
16.17.4:33133,172.16.17.4:33134,172.16.17.4:33135,172.16.17.4:33136,172.16.17.4:33137,172.16.17.4:33138,172.16.17.4:33139,172.16.17.4:33140,172.16.17.4:33141,172.16.17.4:33142,172.16.17.4:33143,172.16.17.4:3314$
,172.16.17.4:33145,172.16.17.4:33146,172.16.17.4:33147,172.16.17.4:33148,172.16.17.4:33149,172.16.17.4:33150,172.16.17.4:33151,172.16.17.4:33152,172.16.17.4:33153,172.16.17.4:33154,172.16.17.4:33155,172.16.17.4$
33156,172.16.17.4:33157,172.16.17.4:33158,172.16.17.4:33159,172.16.17.4:33160,172.16.17.4:33161,172.16.17.4:33162,172.16.17.4:33163,172.16.17.4:33164,172.16.17.4:33165,172.16.17.4:33166,172.16.17.4:33167,172.16$
17.4:33168,172.16.17.4:33169,172.16.17.4:33170,172.16.17.4:33171,172.16.17.4:33172,172.16.17.4:33173,172.16.17.4:33174,172.16.17.4:33175,172.16.17.4:33176,172.16.17.4:33177,172.16.17.4:33178,172.16.17.4:33179,1$
2.16.17.4:33180,172.16.17.4:33181,172.16.17.4:33182,172.16.17.4:33183,172.16.17.4:33184,172.16.17.4:33185,172.16.17.4:33186,172.16.17.4:33187,172.16.17.4:33188,172.16.17.4:33189,172.16.17.4:33190,172.16.17.4:33$
91,172.16.17.4:33192,172.16.17.4:33193,172.16.17.4:33194,172.16.17.4:33195,172.16.17.4:33196,172.16.17.4:33197,172.16.17.4:33198,172.16.17.4:33199,172.16.17.4:33200,172.16.17.4:33201,172.16.17.4:33202,172.16.17$
4:33203,172.16.17.4:33204,172.16.17.4:33205,172.16.17.4:33206,172.16.17.4:33207,172.16.17.4:33208,172.16.17.4:33209,172.16.17.4:33210,172.16.17.4:33211,172.16.17.4:33212,172.16.17.4:33213,172.16.17.4:33214,172.$
6.17.4:33215,172.16.17.4:33216,172.16.17.4:33217,172.16.17.4:33218,172.16.17.4:33219,172.16.17.4:33220,172.16.17.4:33221,172.16.17.4:33222,172.16.17.4:33223,172.16.17.4:33224,172.16.17.4:33225,172.16.17.4:33226$
172.16.17.4:33227,172.16.17.4:33228,172.16.17.4:33229,172.16.17.4:33230,172.16.17.4:33231,172.16.17.4:33232,172.16.17.4:33233,172.16.17.4:33234,172.16.17.4:33235,172.16.17.4:33236,172.16.17.4:33237,172.16.17.4:$
3238,172.16.17.4:33239,172.16.17.4:33240,172.16.17.4:33241,172.16.17.4:33242,172.16.17.4:33243,172.16.17.4:33244,172.16.17.4:33245,172.16.17.4:33246,172.16.17.4:33247,172.16.17.4:33248,172.16.17.4:33249,172.16.$
7.4:33250,172.16.17.4:33251,172.16.17.4:33252,172.16.17.4:33253,172.16.17.4:33254,172.16.17.4:33255,172.16.17.4:33256,172.16.17.4:33257,172.16.17.4:33258,172.16.17.4:33259,172.16.17.4:33260,172.16.17.4:33261,17$
.16.17.4:33262,172.16.17.4:33263,172.16.17.4:33264,172.16.17.4:33265,172.16.17.4:33266,172.16.17.4:33267,172.16.17.4:33268,172.16.17.4:33269,172.16.17.4:33270,172.16.17.4:33271,172.16.17.4:33272,172.16.17.4:332$
3,172.16.17.4:33274,172.16.17.4:33275,172.16.17.4:33276,172.16.17.4:33277,172.16.17.4:33278,172.16.17.4:33279,172.16.17.4:33280,172.16.17.4:33281,172.16.17.4:33282,172.16.17.4:33283,172.16.17.4:33284,172.16.17.$
:33285,172.16.17.4:33286,172.16.17.4:33287,172.16.17.4:33288,172.16.17.4:33289,172.16.17.4:33290,172.16.17.4:33291,172.16.17.4:33292,172.16.17.4:33293,172.16.17.4:33294,172.16.17.4:33295,172.16.17.4:33296,172.1$
.17.4:33297,172.16.17.4:33298,172.16.17.4:33299,172.16.17.4:33300,172.16.17.4:33301,172.16.17.4:33302,172.16.17.4:33303,172.16.17.4:33304,172.16.17.4:33305,172.16.17.4:33306,172.16.17.4:33307,172.16.17.4:33308,$
72.16.17.4:33309,172.16.17.4:33310,172.16.17.4:33311,172.16.17.4:33312,172.16.17.4:33313,172.16.17.4:33314,172.16.17.4:33315,172.16.17.4:33316,172.16.17.4:33317,172.16.17.4:33318,172.16.17.4:33319,172.16.17.4:3$
320,172.16.17.4:33321,172.16.17.4:33322,172.16.17.4:33323,172.16.17.4:33324,172.16.17.4:33325,172.16.17.4:33326,172.16.17.4:33327,172.16.17.4:33328,172.16.17.4:33329,172.16.17.4:33330,172.16.17.4:33331,172.16.1$
.4:33332,172.16.17.4:33333,172.16.17.4:33334,172.16.17.4:33335,172.16.17.4:33336,172.16.17.4:33337,172.16.17.4:33338,172.16.17.4:33339,172.16.17.4:33340,172.16.17.4:33341,172.16.17.4:33342,172.16.17.4:33343,172$
16.17.4:33344,172.16.17.4:33345,172.16.17.4:33346,172.16.17.4:33347,172.16.17.4:33348,172.16.17.4:33349,172.16.17.4:33350,172.16.17.4:33351,172.16.17.4:33352,172.16.17.4:33353,172.16.17.4:33354,172.16.17.4:3335$
,172.16.17.4:33356,172.16.17.4:33357,172.16.17.4:33358,172.16.17.4:33359,172.16.17.4:33360,172.16.17.4:33361,172.16.17.4:33362,172.16.17.4:33363,172.16.17.4:33364,172.16.17.4:33365,172.16.17.4:33366,172.16.17.4$
33367,172.16.17.4:33368,172.16.17.4:33369,172.16.17.4:33370,172.16.17.4:33371,172.16.17.4:33372,172.16.17.4:33373,172.16.17.4:33374,172.16.17.4:33375,172.16.17.4:33376,172.16.17.4:33377,172.16.17.4:33378,172.16$
17.4:33379,172.16.17.4:33380,172.16.17.4:33381,172.16.17.4:33382,172.16.17.4:33383,172.16.17.4:33384,172.16.17.4:33385,172.16.17.4:33386,172.16.17.4:33387,172.16.17.4:33388,172.16.17.4:33389,172.16.17.4:33390,1$
2.16.17.4:33391,172.16.17.4:33392,172.16.17.4:33393,172.16.17.4:33394,172.16.17.4:33395,172.16.17.4:33396,172.16.17.4:33397,172.16.17.4:33398,172.16.17.4:33399,172.16.17.4:33400,172.16.17.4:33401,172.16.17.4:33$
02,172.16.17.4:33403,172.16.17.4:33404,172.16.17.4:33405,172.16.17.4:33406,172.16.17.4:33407,172.16.17.4:33408,172.16.17.4:33409,172.16.17.4:33410,172.16.17.4:33411,172.16.17.4:33412,172.16.17.4:33413,172.16.17$
4:33414,172.16.17.4:33415,172.16.17.4:33416,172.16.17.4:33417,172.16.17.4:33418,172.16.17.4:33419,172.16.17.4:33420,172.16.17.4:33421,172.16.17.4:33422,172.16.17.4:33423,172.16.17.4:33424,172.16.17.4:33425,172.$
6.17.4:33426,172.16.17.4:33427,172.16.17.4:33428,172.16.17.4:33429,172.16.17.4:33430,172.16.17.4:33431,172.16.17.4:33432,172.16.17.4:33433,172.16.17.4:33434,172.16.17.4:33435,172.16.17.4:33436,172.16.17.4:33437$
172.16.17.4:33438,172.16.17.4:33439,172.16.17.4:33440,172.16.17.4:33441,172.16.17.4:33442,172.16.17.4:33443,172.16.17.4:33444,172.16.17.4:33445,172.16.17.4:33446,172.16.17.4:33447,172.16.17.4:33448,172.16.17.4:$
3449,172.16.17.4:33450,172.16.17.4:33451,172.16.17.4:33452,172.16.17.4:33453,172.16.17.4:33454,172.16.17.4:33455,172.16.17.4:33456,172.16.17.4:33457,172.16.17.4:33458,172.16.17.4:33459,172.16.17.4:33460,172.16.$
7.4:33461,172.16.17.4:33462,172.16.17.4:33463,172.16.17.4:33464,172.16.17.4:33465,172.16.17.4:33466,172.16.17.4:33467,172.16.17.4:33468,172.16.17.4:33469,172.16.17.4:33470,172.16.17.4:33471,172.16.17.4:33472,17$
.16.17.4:33473,172.16.17.4:33474,172.16.17.4:33475,172.16.17.4:33476,172.16.17.4:33477,172.16.17.4:33478,172.16.17.4:33479,172.16.17.4:33480,172.16.17.4:33481,172.16.17.4:33482,172.16.17.4:33483,172.16.17.4:334$
4,172.16.17.4:33485,172.16.17.4:33486,172.16.17.4:33487,172.16.17.4:33488,172.16.17.4:33489,172.16.17.4:33490,172.16.17.4:33491,172.16.17.4:33492,172.16.17.4:33493,172.16.17.4:33494,172.16.17.4:33495,172.16.17.$
:33496,172.16.17.4:33497,172.16.17.4:33498,172.16.17.4:33499,172.16.17.4:33500,172.16.17.4:33501,172.16.17.4:33502,172.16.17.4:33503,172.16.17.4:33504,172.16.17.4:33505,172.16.17.4:33506,172.16.17.4:33507,172.1$
.17.4:33508,172.16.17.4:33509,172.16.17.4:33510,172.16.17.4:33511
[172.16.17.6] out: [kata-load-client-2] [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-2] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: Userspace timestamping
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: [kata-load-client-1] pthread_setaffinity_np: Success
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount    QPS     RxBw    TxBw
[172.16.17.6] out: 30005        6000.097585323168       48000.78068258534       47999.18092318915
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat     50th    90th    95th    99th
[172.16.17.6] out: 701.762      650.906(645.506, 655.805)       853.107(842.807, 863.308)       969.408(946.709, 996.408)        2143.319(1810.016, 2986.926)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount    QPS     RxBw    TxBw
[172.16.17.6] out: 202172       40426.97760691137       323415.820855291        323049.48811398225
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat     50th    90th    95th    99th
[172.16.17.6] out: 10517.393    3810.334(3705.132, 3961.935)    18914.767(15925.941, 22009.194) 41921.67(32967.192, 48465.528)   127272.725(105954.436, 229603.729)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount    QPS     RxBw    TxBw
[172.16.17.6] out: 375268       75016.07695830545       600128.6156664436       600621.1692911206
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat     50th    90th    95th    99th
[172.16.17.6] out: 53482.095    13369.618(8442.974, 19440.272)  169534.298(143303.166, 230297.935)      256155.364(210880.464, 330778.924)       417231.888(338917.595, 0)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount    QPS     RxBw    TxBw
[172.16.17.6] out: 501684       100318.42166515095      802547.3733212076       806325.8810985828
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat     50th    90th    95th    99th
[172.16.17.6] out: 122268.85    81908.223(58017.312, 110558.078)        297171.927(254998.353, 384267.796)       377823.539(315642.79, 761939.434)       523363.025(427672.68, 0)
[172.16.17.6] out:
[172.16.17.6] out: Will run for 5 sec
[172.16.17.6] out: #ReqCount    QPS     RxBw    TxBw
[172.16.17.6] out: 629665       125906.5092704495       1.007252074163596e+06   992549.5675709831
[172.16.17.6] out: Check inter-arrival: []
[172.16.17.6] out: #Avg Lat     50th    90th    95th    99th
[172.16.17.6] out: 124278.882   99893.283(76363.775, 115895.624)        260931.306(223331.174, 413681.656)       356536.051(272099.205, 652250.564)      613830.925(435350.648, 0)
[172.16.17.6] out:
[172.16.17.6] out:

Done.
Disconnecting from 172.16.17.6... done.
[172.16.17.4] Executing task 'server_cleanup'
ustiugov commented 5 years ago

@chavafg It looks like Kata v1.8+ is much more robust and the issue is mitigated. Regarding the latency, we are investigating the performance differences/problems of microVMs as a part of my research project. I am going to keep you posted.

I think this issue can be closed now.

BTW, we've open-sourced the official version of Lancet, the latency/throughput measurement tool that I provided for load testing, under MIT license, and my colleague Marios Cogias presented this work at ATC'2019. The paper is available here.

chavafg commented 5 years ago

cool, closing this issue.