Open akaher opened 5 years ago
Found the reason why I am getting "No irq handler for vector": nvme/host/pci.c allocates the irq vectors which assigned to be executed to X CPUs, now this info should be passed to Hypervisor and same should be copied to VECTOR_IRQ (this is percpu vector). However this assignment of CPUs is different as compare with the info passed to Hypervisor and on VECTOR_IRQ. Now once INT received by VM, because of entry is mismatching because of CPU number, INT is dropped.
Please help me to find out why this is happening incase of v4.9 and works fine with v4.14.
Can the 2 top patches on this branch help? https://github.com/dcui/linux/commits/decui/SLES12-SP3-AZURE-2018-1029.
I don't know what goes wrong here, as I don't have a NVMe to test. I hope the 2 patches could help, but I'm completely not sure.
Thanks Dexuan, after applying the following patches NVME IRQs, scheduled on CPU0 and CPU8 (total cpus in vm is 15) and no more "No irq handler for vector" in dmesg : https://github.com/dcui/linux/commit/cd09cb79926713b7f734dfc10eac0bc9f0f882cc https://github.com/dcui/linux/commit/eba61d21b1ad88f9d4f63b5844f17da1dc8d8930
Looking further, how to schedule on alternative CPU.
Glad to know the 2 patches can help!
Now it looks to me that the pci-hyperv driver is good, and you might need to improve the NVMe driver in v4.9 to spread interrupts to more CPUs if necessary (I assume the NVMe driver in the latest mainline kernel should do a better job on this).
Dexuan, any specific reason for not upstreaming following patch to stable mainline kernels: https://github.com/dcui/linux/commit/eba61d21b1ad88f9d4f63b5844f17da1dc8d8930
The patch was made for v4.12.14, which has reached End-of-Life: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-4.12.y
irq_data_get_effective_affinity_mask() is in v4.14+, so it's not needed there.
The other long-term stable kernels (4.9, 4.4, 3.x) seem a little old and it looks they're not widely used any more.
BackPorted required changes of pci-hyperv on Linux 4.9, this driver works.
But getting following issues with nvme:
[74530.686555] do_IRQ: 10.232 No irq handler for vector [74530.712068] do_IRQ: 10.232 No irq handler for vector [74530.737579] do_IRQ: 10.232 No irq handler for vector [74530.763092] do_IRQ: 10.232 No irq handler for vector [74532.832221] nvme nvme1: I/O 206 QID 6 timeout, reset controller [74532.873967] nvme nvme1: completing aborted command with status: fffffffc [74532.873971] blk_update_request: I/O error, dev nvme1n1, sector 1048320
Back-ported the following patch, but still facing same issue: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/drivers/nvme/host/pci.c?id=0ff199cb48b4af6f29a1bf15d92d93f44a22eeb4