kata-containers / agent

Kata Containers version 1.x agent (for version 2.x see https://github.com/kata-containers/kata-containers). Virtual Machine agent for hardware virtualized containers
https://katacontainers.io/
Apache License 2.0
241 stars 114 forks source link

uevent listening goroutine is not (always) working #372

Closed bergwolf closed 6 years ago

bergwolf commented 6 years ago

As I was adding network hotplug support for vm factory and found out that the uevent listening goroutine is not working for me. After hotplugging a new nic, however long I wait in UpdateInterface, the uevent listening goroutine cannot get the new device sysfs uevent.

Then I tested the network monitor process (kata-containers/runtime#534), and it failed on getting the new uevent as well:

Sep 13 09:37:20 macbeth kata-proxy[54869]: time="2018-09-13T09:37:20.532400733+08:00" level=info msg="time="2018-09-13T01:37:20.511006103Z" level=debug msg="new request" name=kata-agent pid=128 req="interface:<device:\"eth1\" name:\"eth1\" IPAddresses:<family:2 address:\"172.18.0.2\" mask:\"16\" > mtu:1500 hwAddr:\"02:42:ac:12:00:02\" pciAddr:\"02/01\" > " request=/grpc.AgentService/UpdateInterface sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent\n" name=kata-proxy pid=54869 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent Sep 13 09:37:20 macbeth kata-proxy[54869]: time="2018-09-13T09:37:20.533607734+08:00" level=info msg="time="2018-09-13T01:37:20.512336531Z" level=info msg="Fetched PCI address for device" completePCIAddr="0000:00:02.0/0000:01:01.0" name=kata-agent pid=128 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent\n" name=kata-proxy pid=54869 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent Sep 13 09:37:20 macbeth kata-proxy[54869]: time="2018-09-13T09:37:20.535074125+08:00" level=info msg="time="2018-09-13T01:37:20.513595652Z" level=info msg="Waiting on channel for device notification" name=kata-agent pciID=02/01 pid=128 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent\n" name=kata-proxy pid=54869 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent Sep 13 09:37:23 macbeth kata-proxy[54869]: time="2018-09-13T09:37:23.537970764+08:00" level=info msg="time="2018-09-13T01:37:23.51786459Z" level=debug msg="request end" duration=3.005565104s error="rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:01.0" name=kata-agent pid=128 request=/grpc.AgentService/UpdateInterface resp="" sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent\n" name=kata-proxy pid=54869 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=agent Sep 13 09:37:23 macbeth kata-runtime[54984]: time="2018-09-13T09:37:23.540837194+08:00" level=error msg="update interface request failed" arch=amd64 command=kata-network container=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 error="rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:01.0" interface-requested="device:"eth1" name:"eth1" IPAddresses:<family:2 address:"172.18.0.2" mask:"16" > mtu:1500 hwAddr:"02:42:ac:12:00:02" pciAddr:"02/01" " name=kata-runtime pid=54984 resulting-interface="" sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=virtcontainers subsystem=kata_agent Sep 13 09:37:23 macbeth kata-runtime[54984]: time="2018-09-13T09:37:23.540955376+08:00" level=error msg="add interface failed" arch=amd64 command=kata-network container=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 error="rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:01.0" name=kata-runtime pid=54984 resulting-interface="" sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=runtime Sep 13 09:37:23 macbeth kata-runtime[54984]: time="2018-09-13T09:37:23.541177053+08:00" level=error msg="rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:01.0" arch=amd64 command=kata-network container=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 name=kata-runtime pid=54984 sandbox=e2145826daa97ea538d49bffc8c25ad13e0849a1edf5546478458fa17aa63002 source=runtime

It's not that the uevent is not sent to the agent. If I remove the waiting and ignore UpdateInterface error, the device shows up very quickly and the uevent listening goroutine is able to pick up uevents for it. So it's like the goroutine is not woke up when agent is waiting in UpdateInterface call. But I don't know why that is happening.

Also since netmon works in @sboeuf 's local environment, I put up a testing PR (https://github.com/kata-containers/tests/pull/743) to get a baseline for environment comparison. And it turns out that two times out of three, it failed on getting the IP of new hotplugged nic. So the failure/success do not seem constant even for CI.

It's not clear to me what has caused the failure/success. @sboeuf @devimc @amshinde @caoruidong help to take a look.

amshinde commented 6 years ago

@bergwolf I'll take a look at this.

amshinde commented 6 years ago

@bergwolf I have not been able to to reproduce this issue at all on my local setup. Looks like the device gets hotplugged well in advance on my setup. I havent had any luck reproducing this on an Azure instance while running the test for 10 min. I'll try to reproduce this issue on Azure tomorrow. I dont think it should matter, but can you let me know what rootfs (and other environement) you were using for your tests?

Meanwhile, I had a look at the changes for uevent and I think change this could be causing the uevent listening loop to be locked permanently: https://github.com/kata-containers/agent/blob/7caf1c8bd9ebad9ab6631e1839d62c8c98ed7d8a/device.go#L125-L132

With that change the sandbox lock is never released. I have pushed a revert for it here: https://github.com/kata-containers/agent/pull/375 Can you test with that and let me know. Meanwhile I will continue testing tomorrow, to see if there is any other issue related to network hotplug uevents.

bergwolf commented 6 years ago

@amshinde Thanks for looking into it. Nice catch on the missing lock! I guess I never hit that code path though. I can see Waiting on channel for device notification printed in the agent console. E.g., (just reverted that commit and ran kata-containers/runtime#534):

Sep 19 16:09:55 vmfarm kata-proxy[7203]: time="2018-09-19T16:09:55.999956371+08:00" level=info msg="time=\"2018-09-19T08:09:55.990219937Z\" level=info msg=\"Fetched PCI address for device\" completePCIAddr=\"0000:00:02.0/0000:01:01.0\" name=kata-agent pid=1 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent\n" name=kata-proxy pid=7203 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent Sep 19 16:09:56 vmfarm kata-proxy[7203]: time="2018-09-19T16:09:56.000040707+08:00" level=info msg="time=\"2018-09-19T08:09:55.990355952Z\" level=info msg=\"Waiting on channel for device notification\" name=kata-agent pciID=02/01 pid=1 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent\n" name=kata-proxy pid=7203 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent Sep 19 16:09:56 vmfarm kata-proxy[7203]: time="2018-09-19T16:09:56.573418625+08:00" level=info msg="[ 1.184191] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready\n" name=kata-proxy pid=7203 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent Sep 19 16:09:56 vmfarm kata-proxy[7203]: time="2018-09-19T16:09:56.575381281+08:00" level=info msg="[ 1.185224] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x39c5ded8016, max_idle_ns: 440795406786 ns\n" name=kata-proxy pid=7203 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent Sep 19 16:09:59 vmfarm kata-proxy[7203]: time="2018-09-19T16:09:59.000995824+08:00" level=info msg="time=\"2018-09-19T08:09:58.990735646Z\" level=debug msg=\"request end\" duration=3.000515709s error=\"rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:01.0\" name=kata-agent pid=1 request=/grpc.AgentService/UpdateInterface resp=\"\" sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent\n" name=kata-proxy pid=7203 sandbox=5a84955650e0852ff6b513bed1e9305c5a24d16ddd0d01a695d7cc645d522b3d source=agent

And I tried both initrd (make DISTRO=alpine USE_DOCKER=yes AGENT_INIT=yes initrd) and rootfs image (make DISTRO=centos USE_DOCKER=yes AGENT_INIT=no image). Both failed the same way.

Here is my complete kata-collect-data.sh result: https://pastebin.com/TSfD3A6e

amshinde commented 6 years ago

@bergwolf I see the important difference from your environment is that you are using "q35" machine type, while I am using "pc" machine type. Just running the test for netmon, I dont think the network hotplug for q35 is even successful. I dont see any uevent related to network device for q35 at all.

@sboeuf Were you able to reproduce this issue at all with "pc" machine type?

amshinde commented 6 years ago

@bergwolf Can you give this a try with "pc" machine type and paste the logs here if you see any issues with "pc".

amshinde commented 6 years ago

@bergwolf The q35 hotplug was not working because of missing kernel config CONFIG_HOTPLUG_PCI_SPC. Without this, we would need to explicitly rescan the PCI bus which is not the best solution. @devimc Will be submitting a PR to enable this in our kernel configuration.

amshinde commented 6 years ago

@bergwolf This PR should solve the hotplug issue for q35: https://github.com/kata-containers/packaging/pull/186

For pc, I am wondering if we are running into similar SHPC related issues in our CI (although I cant reproduce it on my setup). We have shpc turned on for pci-bridges in virtcontainers. So I am not sure if it has to do with the timing of scanning the PCI bus. With your current guest kernel and "pc"machine type, can you turn off shpc for pci-bridges and run the netmon test ? You would need to turn off shpc here: https://github.com/kata-containers/runtime/blob/d6e4a98387fe81b448da9eb8fca9bed500e7c241/virtcontainers/qemu_arch_base.go#L358

and

https://github.com/kata-containers/runtime/blob/7d8ce4ec922551845865325ca8e24ba8f8e2074a/virtcontainers/qemu.go#L1201

bergwolf commented 6 years ago

Thanks @amshinde ! You do find the root cause. I am able to pass the test as long as I change the machine type back to pc.

And if I disable SHPC, I cannot pass it with pc + current guest kernel.

I'll build a new kernel w/ CONFIG_HOTPLUG_PCI_SPC and test again. Thanks a lot for your help!

bergwolf commented 6 years ago

@amshinde In my environment, with CONFIG_HOTPLUG_PCI_SPC enabled in guest kernel + SHPC enabled in qemu, the nic device still shows up after UpdateInterface request timeout. Are you able to pass the test with CONFIG_HOTPLUG_PCI_SPC + q35?

bergwolf commented 6 years ago

@amshinde I added pci bus rescanning and fixed an issue with the listening channel. See #380. With that I can pass the test w/ and w/o CONFIG_HOTPLUG_PCI_SPC using q35 machine type.

And the test fails if I take out the pci bus rescan commit. The device shows up a few seconds after the waiting fails. So I think we would always want to scan the pci bus for new devices to find out new devices timely.

sboeuf commented 6 years ago

@bergwolf I'd like to understand what's going on with pc and shpc. It's clear that for q35, we need shpc support as the hotplug mechanism, hence we need the SHPC to be enabled from the guest kernel config. But, I don't get how this interact in case of pc. pc should be using ACPI in order to hotplug PCI devices, and I don't think this should take any extra time compared to the other mechanism. Do you have anymore input on that?

And yes, adding PCI rescan works since we don't need any hotplug mechanism, as we simply force PCI enumeration every time we add a new device from Qemu. But I don't think this should be needed, and I'd like to make sure we understand what's going on.

sboeuf commented 6 years ago

And just to add a comment here, I am not against forcing the PCI rescan, but I want to make sure we're not hiding a potential lack of support regarding PCI hotplug.

sboeuf commented 6 years ago

/cc @mcastelino I can see that not using shpc with pc is a problem when using a pci-bridge (-device pci-bridge,id=pci-bridge-0,chassis_nr=1,shpc=off). The PCI hotplug on the main bus (PCI host bridge) works just fine through ACPI.

But here is the error I get when I try to hotplug through the bridge:

[   11.227923] pci 0000:01:04.0: BAR 6: no space for [mem size 0x00040000 pref]
[   11.228251] pci 0000:01:04.0: BAR 6: failed to assign [mem size 0x00040000 pref]
[   11.228431] pci 0000:01:04.0: BAR 4: no space for [mem size 0x00004000 64bit pref]
[   11.228583] pci 0000:01:04.0: BAR 4: failed to assign [mem size 0x00004000 64bit pref]
[   11.228741] pci 0000:01:04.0: BAR 1: no space for [mem size 0x00001000]
[   11.228871] pci 0000:01:04.0: BAR 1: failed to assign [mem size 0x00001000]
[   11.229018] pci 0000:01:04.0: BAR 0: no space for [io  size 0x0020]
[   11.229167] pci 0000:01:04.0: BAR 0: failed to assign [io  size 0x0020]
[   11.251395] PCI Interrupt Link [LNKA] enabled at IRQ 11
[   11.251769] virtio-pci 0000:01:04.0: virtio_pci: leaving for legacy driver
[   11.267333] virtio-pci: probe of 0000:01:04.0 failed with error -12

Any idea why there's not enough space for the PCI holes defined by the BARs?

bergwolf commented 6 years ago

@sboeuf As I tested it, CONFIG_HOTPLUG_PCI_SPC does bring up the network device for q35 but that is a few seconds after we timeout in UpdateInterface API so the device is still not properly configured.

But, I don't get how this interact in case of pc. pc should be using ACPI in order to hotplug PCI devices, and I don't think this should take any extra time compared to the other mechanism.

We enable SHPC on pci bridges for both pc and q35 right now.

miaoyq commented 5 years ago

I was testing the network vfio device hotplug localy based on kata v1.3.0, but always failed with the error print:

kata-proxy[17695]: time="2018-10-23T19:38:49.342930445+08:00" level=info msg="[  237.429816] pci 0000:01:01.0: [8086:1581] type 00 class 0x020000\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.343713099+08:00" level=info msg="[  237.430427] pci 0000:01:01.0: reg 0x10: [mem 0x00000000-0x007fffff 64bit pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.344064018+08:00" level=info msg="[  237.430989] pci 0000:01:01.0: reg 0x1c: [mem 0x00000000-0x00007fff 64bit pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.344546049+08:00" level=info msg="[  237.431597] pci 0000:01:01.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.344831647+08:00" level=info msg="[  237.431913] pci 0000:01:01.0: enabling Extended Tags\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.346025215+08:00" level=info msg="[  237.433022] pci 0000:01:01.0: PME# supported from D0 D3hot\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.346824703+08:00" level=info msg="[  237.433766] pci 0000:01:01.0: BAR 0: no space for [mem size 0x00800000 64bit pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.347071407+08:00" level=info msg="[  237.434093] pci 0000:01:01.0: BAR 0: failed to assign [mem size 0x00800000 64bit pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.347301228+08:00" level=info msg="[  237.434314] pci 0000:01:01.0: BAR 6: assigned [mem 0xfe400000-0xfe47ffff pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.347513392+08:00" level=info msg="[  237.434548] pci 0000:01:01.0: BAR 3: assigned [mem 0xfe800000-0xfe807fff 64bit pref]\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.375500627+08:00" level=info msg="[  237.462523] i40e 0000:01:01.0: enabling device (0100 -> 0102)\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.377076131+08:00" level=info msg="[  237.464083] i40e 0000:01:01.0: ioremap(0x0000, 0x0000) failed: 0xfffffffb\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent
kata-proxy[17695]: time="2018-10-23T19:38:49.377774156+08:00" level=info msg="[  237.464810] i40e: probe of 0000:01:01.0 failed with error -5\n" name=kata-proxy pid=17695 sandbox=c50c01b8205f4813d6c1410a5834fd16cda283210f7e0c666886bdbb9900d10d source=agent

I have seen the device in the container, but the device has no driver:

/sys/devices/pci0000:00/0000:00:02.0/0000:01:01.0 # cat uevent
PCI_CLASS=20000
PCI_ID=8086:1581
PCI_SUBSYS_ID=8086:0000
PCI_SLOT_NAME=0000:01:01.0
MODALIAS=pci:v00008086d00001581sv00008086sd00000000bc02sc00i00

Machine type is pc with shpc=on Is there something wrong with my environment, or something else? @bergwolf @sboeuf

sboeuf commented 5 years ago

@miaoyq Have you been able to make this work with a simple veth hotplug?

The network hotplug of a VFIO device is not supported yet, but should land soon. I just want to make sure the standard veth use case works for you.

miaoyq commented 5 years ago

Have you been able to make this work with a simple veth hotplug?

@sboeuf Yeah, works very well :-)

We were implementing the network hotplug of a VFIO device locally based on v1.3.0. it works well in one of my environments, but always failed in another one with above error print.

sboeuf commented 5 years ago

@miaoyq Oh okay! So I think the issue might come from the fact that your device needs a lot of memory and because it is hotplugged behind a PCI bridge, the reserved space is not large enough. Try to use:

hotplug_vfio_on_root_bus = true

from the configuration.toml. The main PCI bus has more space and should be able to support your device.

miaoyq commented 5 years ago

Thanks @sboeuf I set hotplug_vfio_on_root_bus = true, and get following print:

"{\"error\": {\"class\": \"GenericError\", \"desc\": \"Bus 'pcie.0' does not support hotplugging\"}}" 

It seems that root bus does not support hotplug.

sboeuf commented 5 years ago

@miaoyq are you using q35 machine type? You need to use pc, which will allow devices to be plugged onto the PCI host bridge directly. By using q35, you would require some extra code taking care of creating a PCIe hierarchy so that you can hotplug to a downstream root port for instance.