kata-containers / qemu

Kata containers QEMU
Other
22 stars 19 forks source link

piix4: fix hotplug issue for nofw case #4

Closed liujing2 closed 5 years ago

liujing2 commented 5 years ago

In no firmware case, hotplug onto pci.0 has problem that "Unsupported bus. Bus doesn't have property 'apic-pcihp-bsel' " on pc platform. This patch fixes the issue to enable hotplug.

liujing2 commented 5 years ago

Hi @grahamwhaley @jcvenegas @devimc Sorry for disturbing you. I'm not sure who are the related maintainers so just ping you for some help. :) Alibaba told us this issue and I tried to fix. Thanks! Jing

grahamwhaley commented 5 years ago

No problem @liujing2 - that's what were are here for :-) I think @devimc will have most input on pci bus issues. In the meantime, you will need to do a little rework to meet our PR format requirements. You need to open an Issue and put details there about the problem, and then in your commit you will need a:

Fixes: nnn

line referencing the Issue number. See https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format for more details.

liujing2 commented 5 years ago

@grahamwhaley Thanks for your suggestion! I will make the commit message better after @devimc reviewing. So do I still need a new issue then?

grahamwhaley commented 5 years ago

@liujing2 - notionally, yes, you should have an Issue open in this repo you can tie the reference in the PR to - but - it doesn't look like we have Issues enabled in this repo maybe - I cannot see an Issues tab or how to make one. I think that is because maybe we have not landed any PRs in this repo since we enabled full static CI checking.... @jodh-intel @chavafg wdyt? I suspect we need to enable Issues for this repo....?

jodh-intel commented 5 years ago

I've just enabled issues.

Note that since this repo is just a clone of the qemu one, it doesn't have any of the normal Kata infrastructure+process around it (in fact, even the Travis config seems to be from upstream).

devimc commented 5 years ago

l g t m , but @liujing2 please fix the CI

liujing2 commented 5 years ago

@devimc I looked into the CI results. It reports a "Build Failed" issue which said,

hw/pci/pci.c:2215:pci_add_option_rom: Object 0x561a37d1d270 is not an instance of type generic-pc-machine
Broken pipe

Locally I tested pc/q35 that both work well. Deep dived into the codes and found it probably caused not by this patch. The reported codes are as follows.

static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
                               Error **errp)
{
        PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); //hw/pci/pci.c:2215

PC_MACHINE() reports the issue, but this is indeed strange because this line of code is used by many places. The parameter is just got from qdev_get_machine. So could I suggest running the CI without my patch to firstly double check if original codes also have this issue? And further, we can then take a look if the issue is caused by this line of code or CI itself?

jodh-intel commented 5 years ago

I'm rather confused by the Travis config for this repo...

Clicking the "Details" link above takes you to a github page summarising the Travis build. At the bottom of that page is a link to the build at https://travis-ci.com/kata-containers/qemu/builds/92956237. Note that the URL contains travis-ci.com (paid for version), not travis-ci.org (free version).

I wonder if this is an accident based on us using the vanilla upstream .travis.yml file? Anyone know anything about this?

jodh-intel commented 5 years ago

The summary is that I don't have permissions to run the CI for the kata-containers:qemu-lite-2.11.0 branch on Travis.

grahamwhaley commented 5 years ago

afaict, what is happening is that the 'kata containers' project (top level) has Travis enabled as a 'github app', and this it appears/ties into the qemu repo 'integrations and services'. That makes it automatically pick up and process the .travis.yml file we have inherited from the qemu forking. Currently Travis for kata is configured to access 'all repositories'. I think we can probably fix this by ticking the 'only selected repositories' tickbox under the kata Travis setup, and then only selecting the ones we want to process :-) I can give that a go if we agree that is probably the right course of action...

jodh-intel commented 5 years ago

Well, we do want qemu to be tested with Travis I think, but I don't believe we should be using travis-ci.com.

grahamwhaley commented 5 years ago

OK, this is indeed somewhat odd. Having a further prod around, it looks like we can also en/disable github repo processing on a per-repo basis on the Travis end as well. afaict, the qemu repo is not enabled for kata on the travic-ci.org - which makes me suspect maybe somebody has enabled it on travis-ci.com using their Travis account? I don't have a travis-ci.com account - if anybody here does in the kata group, can they go and check if qemu has been enabled?? Most odd - afaik we don't have a chargeable account for kata with travis-ci - so, who just paid for that build? :-)

@jodh-intel - I would suggest we disable our qemu repo from travis builds at the github end right now until we:

jodh-intel commented 5 years ago

travis-ci.com access may have been setup for one of the other architectures possibly?

/cc @Pennyzct and @nitkon, @sameo, @sboeuf, @bergwolf.

jodh-intel commented 5 years ago

Ugh - we seem to have hit a github/travis bug, or possibly we just need someone with a travis-ci.com account - I've changed the kata projects settings to only use Travis for a specified list of kata repos - basically all repos with a .travis.yml excluding qemu. However, although those settings appear to be "sticking" in github, if I click "save", I get redirected... to travis-ci.com, not travis-ci.org.

Could you try saving @grahamwhaley and see if you see the same behaviour?

grahamwhaley commented 5 years ago

@jodh-intel - well, I went to follow the link path ... I think you have made a change - now when I go to the 'integrations and services' on our kata/qemu on github, travis is no longer listed. So, I think it has been dropped on the github side at least... We could now try maybe enabling at https://travis-ci.org/kata-containers and see what happens??

jodh-intel commented 5 years ago

Yes, I noticed that. You should be able to see what I'm talking about if you click "Settings" for https://github.com/kata-containers/

grahamwhaley commented 5 years ago

Right, that 'save' takes me to .com as well. Maybe you can do it from the 'other end' though - if you go to https://travis-ci.org/kata-containers then you can select which repos are processed. QEMU is not ticked there - but, maybe we can tick it there and enable the travis-ci.org to track.

jodh-intel commented 5 years ago

I don't seem to be able to do that - I can't see any branches apart from master (which we don't actually want).

Maybe once we've identified who has the travis-ci.com a/c, they can help resolve this.

jodh-intel commented 5 years ago

I've got a vague feeling that Travis may auto-detect branches when the server detects a change so we could try tweaking the .travis.yml maybe (to specify the branches we do or don't want to build).

liujing2 commented 5 years ago

Thanks @sboeuf for reviewing and approving. @devimc Could I ask what do I need do more for this PR? Since the CI seems not ready now.

devimc commented 5 years ago

thanks @liujing2

liujing2 commented 5 years ago

Thanks @devimc

YvesChan commented 5 years ago

@liujing2 I found a strange issue here, maybe it's related to this nofw feature?

I need to hotplug an qcow2 disk on pci-bridge-0, the qmp command return nothing(which is ok), but there's no response in guest OS (no related log in dmesg, no block device generated). I try to boot the VM without nofw, and the hotplug works as expected.

Below is the QMP command:

(QEMU) blockdev-add cache={"direct":true,"no-flush":false} driver=qcow2 file={"driver":"file","filename":"/tmp/testhp.img"} node-name=drive-testhp
{"return": {}}
(QEMU) device_add bus=pci-bridge-0 drive=drive-testhp driver=virtio-blk-pci id=virtio-drive-testhp share-rw=on
{"return": {}}
(QEMU) query-block
{"return": [{"locked": false, "type": "unknown", "qdev": "/machine/peripheral/virtio-drive-testhp/virtio-backend", "removable": false, "device": "", "inserted": {"bps_rd": 0, "ro": false, "backing_file_depth": 0, "encrypted": false, "image": {"cluster-size": 65536, "format": "qcow2", "filename": "/tmp/testhp.img", "virtual-size": 104857600, "dirty-flag": false, "format-specific": {"data": {"compat": "1.1", "refcount-bits": 16, "corrupt": false, "lazy-refcounts": false}, "type": "qcow2"}, "actual-size": 200704}, "cache": {"no-flush": false, "writeback": true, "direct": true}, "bps_wr": 0, "drv": "qcow2", "node-name": "drive-testhp", "bps": 0, "iops": 0, "write_threshold": 0, "file": "/tmp/testhp.img", "iops_rd": 0, "encryption_key_missing": false, "detect_zeroes": "off", "iops_wr": 0}}]}

I don't know if it is appropriate to comment here, but any advice and suggestions will be greatly appreciated.

Thanks

grahamwhaley commented 5 years ago

other folks who might have thoughts: /cc @devimc @markdryan @rbradford

devimc commented 5 years ago

@YvesChan in your guest kernel you have to enable

in q35 you have to re-scan the pci bus echo 1 > /sys/bus/pci/rescan

YvesChan commented 5 years ago

@YvesChan in your guest kernel you have to enable

  • CONFIG_HOTPLUG_PCI_SHPC
  • CONFIG_HOTPLUG_PCI_ACPI

in q35 you have to re-scan the pci bus echo 1 > /sys/bus/pci/rescan

@devimc Thanks for your reply!

  1. I have checked the config file of guest kernel, and make sure the CONFIG_HOTPLUG_PCI_ACPI is set to "y", but CONFIG_HOTPLUG_PCI_SHPC is not set. Is it the reason why guest kernel didn't bring the device up automatically? So in this nofw case, which hotplug device model is being used? ACPI-based or SHPC-based?

  2. Though I'm using pc machine type, I try to run echo 1 > /sys/bus/pci/rescan after device_add, and the block device just come up! dmesg show these info:

    [   64.556966] clr: pci_setup_device start
    [   64.557098] pci 0000:01:00.0: [1af4:1001] type 00 class 0x010000
    [   64.557238] clr: pci_read_bases start
    [   64.557284] clr: Starting probe for 0000:01:00.0
    [   64.557374] pci 0000:01:00.0: reg 0x10: [io  0x0000-0x003f]
    [   64.557431] clr: Starting probe for 0000:01:00.0
    [   64.557527] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
    [   64.557599] clr: pci_read_bases end
    [   64.557662] clr: pci_setup_device end
    [   64.559606] pci 0000:00:02.0: BAR 8: assigned [mem 0x80100000-0x801fffff]
    [   64.559688] pci 0000:00:02.0: BAR 7: assigned [io  0x2000-0x2fff]
    [   64.559759] pci 0000:01:00.0: BAR 1: assigned [mem 0x80100000-0x80100fff]
    [   64.559857] pci 0000:01:00.0: BAR 0: assigned [io  0x2000-0x203f]
    [   64.559937] pci 0000:00:02.0: PCI bridge to [bus 01]
    [   64.559998] pci 0000:00:02.0:   bridge window [io  0x2000-0x2fff]
    [   64.560811] pci 0000:00:02.0:   bridge window [mem 0x80100000-0x801fffff]
    [   64.562356] pci 0000:00:02.0: enabling device (0000 -> 0003)
    [   64.587635] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
    [   64.588205] virtio-pci 0000:01:00.0: enabling device (0000 -> 0003)
    [   64.613224] virtio-pci 0000:01:00.0: virtio_pci: leaving for legacy driver
devimc commented 5 years ago

@YvesChan

Is it the reason why guest kernel didn't bring the device up automatically?

you are using nofw, so it's a bug, I guess related to qemu ACPI tables

So in this nofw case, which hotplug device model is being used? ACPI-based or SHPC-based?

it depends of the machine type and the bridge, q35/pc + pci-bridge = ACPI q35 + pcie-pci-bridge = SHPC

YvesChan commented 5 years ago

@devimc

so it's a bug, I guess related to qemu ACPI tables

Got it. Hope this bug will be fixed soon. :)

Thanks

liujing2 commented 5 years ago

@YvesChan Got your request and I will take time to re-produce it. Could you give your qemu commands (launching the guest) instead of kata commands so I can quickly re-produce. Thanks!

YvesChan commented 5 years ago

@liujing2

I'm using a script to boot VM, it's pretty easy to re-produce this issue.

$QEMU_BIN  -machine pc,accel=kvm,kernel_irqchip,nvdimm,nofw \
        -cpu host -m 2G,slots=2,maxmem=100G -smp 2,sockets=1,cores=2,threads=1 -numa node,nodeid=0,cpus=0-1 \
        -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on \
        -kernel $KERNEL -append "$KERNEL_CMD" \
        -no-user-config -nodefaults \
        -object memory-backend-file,id=mem0,share,mem-path=$IMAGE,size=$IMAGE_SIZE -device nvdimm,memdev=mem0,id=nv0 \
        -device virtio-serial-pci,id=virtio-serial0  \
        -chardev stdio,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 \
        -nographic -qmp unix:/tmp/qemu,server,nowait \
        -device virtserialport,chardev=metricagent,id=channel0,name=agent.channel.0 \
        -chardev socket,id=metricagent,path=/tmp/metric.agent.channel.sock,nowait,server

After VM boot up, use QMP to send commands:

(QEMU) blockdev-add cache={"direct":true,"no-flush":false} driver=qcow2 file={"driver":"file","filename":"/tmp/testhp.img"} node-name=drive-testhp
{"return": {}}
(QEMU) device_add bus=pci-bridge-0 drive=drive-testhp driver=virtio-blk-pci id=virtio-drive-testhp share-rw=on
{"return": {}}
(QEMU) query-block
{"return": [{"locked": false, "type": "unknown", "qdev": "/machine/peripheral/virtio-drive-testhp/virtio-backend", "removable": false, "device": "", "inserted": {"bps_rd": 0, "ro": false, "backing_file_depth": 0, "encrypted": false, "image": {"cluster-size": 65536, "format": "qcow2", "filename": "/tmp/testhp.img", "virtual-size": 104857600, "dirty-flag": false, "format-specific": {"data": {"compat": "1.1", "refcount-bits": 16, "corrupt": false, "lazy-refcounts": false}, "type": "qcow2"}, "actual-size": 200704}, "cache": {"no-flush": false, "writeback": true, "direct": true}, "bps_wr": 0, "drv": "qcow2", "node-name": "drive-testhp", "bps": 0, "iops": 0, "write_threshold": 0, "file": "/tmp/testhp.img", "iops_rd": 0, "encryption_key_missing": false, "detect_zeroes": "off", "iops_wr": 0}}]}

Thanks!

devimc commented 5 years ago

@YvesChan can you try disabling shpc?

YvesChan commented 5 years ago

@YvesChan can you try disabling shpc?

@devimc how to disable shpc? Did you mean the kernel config "CONFIG_HOTPLUG_PCI_SHPC" or qemu-kvm boot args?

I try all the methods using "nofw" features described in your documentations ((https://gist.github.com/devimc/e9fd533e52b08387f1df65df8b19e038) , but seems none of them works without pci/pcie rescan.

Just curious about the rescan operation, is it an excepted behavior? There should be a hardware interrupt while hot plugging device, so the guest OS can handle it automatically.

devimc commented 5 years ago

how to disable shpc?

-device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=off

is it an excepted behavior?

no

There should be a hardware interrupt while hot plugging device, so the guest OS can handle it automatically

you're right

YvesChan commented 5 years ago

@devimc I have tried the shpc=off option and it doesn't work :( rescan is still the only way to enable the qcow2 device

liujing2 commented 5 years ago

@YvesChan Did you try normal (with fw) hotplug on pci-bridge-0? It can work well?

YvesChan commented 5 years ago

@YvesChan Did you try normal (with fw) hotplug on pci-bridge-0? It can work well?

Yes, vmlinuz without "nofw" works well. I think the key part is how to generate the hardware interrupt to guest kernel while hot plugging virtio-blk device.

liujing2 commented 5 years ago

@YvesChan Actually my colleague and I found that we still need rescan when with fw. Even for all the pc platform hotplug, not only virtio-blk. virtio-net-pci needs to rescan too, even on pci.0, and with upstream qemu version v3.1.0. So I am curious whether I missed something...

YvesChan commented 5 years ago

@liujing2 That's odd. Like @devimc said, rescan should NOT be a necessary operation.

liujing2 commented 5 years ago

@YvesChan @devimc I found that kata-containers/vmlinuz-4.14.22.1-128.1.container is the kernel that can w/o rescan. Now I only have limited time to test w/ firmware. After CHY, I'll continue. BTW, @YvesChan could you send me your kernel config file using gist or to my email jing2.liu@intel.com? Because that kernel can not do nofw test and we need know the key difference. Thanks and happy new year!

YvesChan commented 5 years ago

@liujing2 That would be great. I'll send you the kernel config file to your email as soon as possible. Happy new year!

liujing2 commented 5 years ago

@YvesChan I didn't get your email. Have you already sent or not? I'm just afraid of wrong address.

YvesChan commented 5 years ago

@liujing2 Sorry for the delay. I have just sent you the config file. If you find any clues, please let me know. Thanks!

liujing2 commented 5 years ago

@YvesChan Hi sorry for no reply this week and I spent some time debugging this issue. Finally got why in nofw case guest can not receive the interrupt. But actually I found that the pci-bridge-0 doesn't allocate space like (nofw)

00:06.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal decode])

        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None

So we need first fix this issue. Or else we could not use the hotplugged device. I will be a little busy in these days and I would be grateful if you have fixes on this. I think this probably because pci-bridge windows space is allocated by bios. And I will also try myself to squeeze my time for this issue.

YvesChan commented 5 years ago

@liujing2 Thanks for your reply! I don't have much experience on BIOS/PCI but I'll try it out.