open-power-host-os / qemu

OpenPOWER Host OS qemu repository
Other
2 stars 3 forks source link

Cannot boot using vhost-scsi controller. #2

Closed cuinutanix closed 7 years ago

cuinutanix commented 7 years ago

After SLOF successfully probes scsi disks on a virtio-scsi controller, it sends a write to the PCI_COMMAND register with value 0x0 which disables the PCI device, which qemu translates to virtio_set_status(vdev, vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK); which tells the vhost backend that the driver wants the device stopped.

qemu's own implementation of virtio-scsi seems to ignore the fact that the pci device is disabled, and will happily continue to process scsi requests from the ring, and guests are able to boot. However, with a vhost-scsi backend, qemu tells the vhost backend to stop processing the ring, and the guest does not boot.

Here is a gdb stack trace of when qemu disables the vhost-scsi device:

#0  0x00000000202f63e8 in vhost_user_scsi_set_status (vdev=0x2173ffb0, 
    status=11 '\v')
    at /home/mcui/workspace/NXPower/qemu/hw/scsi/vhost-user-scsi.c:43
#1  0x000000002030fd9c in virtio_set_status (vdev=0x2173ffb0, val=11 '\v')
    at /home/mcui/workspace/NXPower/qemu/hw/virtio/virtio.c:898
#2  0x00000000205c4b38 in virtio_write_config (pci_dev=0x21737ba0, address=4, 
    val=1048832, len=4) at hw/virtio/virtio-pci.c:633
#3  0x000000002055bcec in pci_host_config_write_common (pci_dev=0x21737ba0, 
    addr=<optimized out>, limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:66
#4  0x0000000020341810 in finish_write_pci_config (spapr=<optimized out>, 
    buid=<optimized out>, addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2121530032)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_pci.c:200
#5  0x000000002033c99c in spapr_rtas_call (cpu=<optimized out>, 
    spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, rets=<optimized out>)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_rtas.c:665
#6  0x0000000020336784 in h_rtas (cpu=0x3fffb6260010, spapr=0x210aee80, 
    opcode=<optimized out>, args=<optimized out>)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_hcall.c:667
#7  0x0000000020339338 in spapr_hypercall (cpu=0x3fffb6260010, opcode=61440, 
    args=0x3fffb6200030)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_hcall.c:1243
#8  0x000000002040c6b4 in kvm_arch_handle_exit (cs=0x3fffb6260010, 
    run=0x3fffb6200000)
    at /home/mcui/workspace/NXPower/qemu/target-ppc/kvm.c:1817
#9  0x00000000202a25f8 in kvm_cpu_exec (cpu=0x3fffb6260010)
    at /home/mcui/workspace/NXPower/qemu/kvm-all.c:2016
#10 0x0000000020288350 in qemu_kvm_cpu_thread_fn (arg=0x3fffb6260010)
    at /home/mcui/workspace/NXPower/qemu/cpus.c:998
#11 0x00003fffb7488728 in start_thread () from /lib64/libpthread.so.0
#12 0x00003fffb73bd210 in clone () from /lib64/libc.so.6

You would need to set up vhost-scsi in order to reproduce the exact same issue. You can observe that the VIRTIO_CONFIG_S_DRIVER_OK status bit being cleared even with a standard virtio-scsi backend by setting a breakpoint on virtio_set_status() and observe that the status goes from 0 to 0xf (fully functional), then after the bus scan completes, the status is set to 0xb, with VIRTIO_CONFIG_S_DRIVER_OK cleared.

mdroth commented 7 years ago

the register value in the trace seems to be disabling io/mem too? that could be pci-device-disable in board-qemu/slof/pci-device_1af4_1004.fs

i wonder if simply removing the pci-device-disable from that file allow the boot to continue?

cuinutanix commented 7 years ago

No it doesn't. It just removes 1 call to pci-device-disable but there are others, from pci-device.fs:

\ prepare the device for subsequent use
\ this word should be overloaded by the device file (if present)
\ the device file can call this file before implementing
\ its own open functionality
: open
        puid >r             \ save the old puid
        my-puid TO puid     \ set up the puid to the devices Hostbridge
        pci-master-enable   \ And enable Bus Master, IO and MEM access again.
        pci-mem-enable      \ enable mem access
        pci-io-enable       \ enable io access
        r> TO puid          \ restore puid
        true
;

\ close the previously opened device
\ this word should be overloaded by the device file (if present)
\ the device file can call this file after its implementation
\ of own close functionality
: close 
        puid >r             \ save the old puid
        my-puid TO puid     \ set up the puid
        pci-device-disable  \ and disable the device
        r> TO puid          \ restore puid
;

Node that the open method enables the device, but that's not getting called.

cuinutanix commented 7 years ago

OK, I think I figured it out, clue is in the comment:

\ prepare the device for subsequent use
\ this word should be overloaded by the device file (if present)
\ the device file can call this file before implementing
\ its own open functionality

virtio-scsi.fs does not have the "open" word.

laggarcia commented 7 years ago

Closing as per previous comment.