Closed quo closed 1 month ago
It says 01:05.0, but the actual address of the THC is 00:10.6. I guess that's the problem? But I don't really know how any of this IOMMU stuff works. According to lspci there isn't even anything at 01:05.0...
Should probably compile a kernel with INTEL_IOMMU_DEBUGFS to get some more info.
The source/request ID check is set up in the kernel by set_msi_sid()
. This checks for DMA alias ids with pci_for_each_dma_alias()
. As far as I can tell, the only ways you can have aliases are:
pci_real_dma_dev()
), which I don't think it is,pci_add_dma_alias()
, but this can only create aliases on the same bus, orSo since there are no aliases, set_msi_sid()
creates a strict check for the 00:10.6 sid. And then for some reason the interrupt has request id 01:05.0.
We can't use a quirk to add an alias since the bus number differs. We can add a hack to set_msi_sid()
to disable id checking for the THC device only (instead of just disabling it for all devices with intremap=nosid
). We could also leave the check enabled, but hardcode the sid to 01:05.0 for THC, but since I don't understand where that value comes from, I don't know if the value might change.
Patch to disable the sid check for ITHC device only:
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index a67319597884..9f9322a17810 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -396,6 +396,22 @@ static int set_msi_sid(struct irte *irte, struct pci_dev *dev)
data.busmatch_count = 0;
pci_for_each_dma_alias(dev, set_msi_sid_cb, &data);
+ /*
+ * The Intel Touch Host Controller is at 00:10.6, but for some reason
+ * the MSI interrupts have request id 01:05.0.
+ * Disable id verification to work around this.
+ * FIXME Find proper fix or turn this into a quirk.
+ */
+ if (dev->vendor == PCI_VENDOR_ID_INTEL && (dev->class >> 8) == PCI_CLASS_INPUT_PEN) {
+ switch(dev->device) {
+ case 0x98d0: case 0x98d1: // LKF
+ case 0xa0d0: case 0xa0d1: // TGL LP
+ case 0x43d0: case 0x43d1: // TGL H
+ set_irte_sid(irte, SVT_NO_VERIFY, SQ_ALL_16, 0);
+ return 0;
+ }
+ }
+
/*
* DMA alias provides us with a PCI device and alias. The only case
* where the it will return an alias on a different bus than the
Any news about this? Firmware seems to have upgraded since then, did acpi table changed?
It looks like nosid
is maybe no longer required for the new Alder Lake devices, so I suspect it was a hardware bug on Tiger Lake.
I don't know if something could be done on the ACPI side to fix/workaround the problem (I don't really know how ACPI interacts with the iommu).
I believe the above patch will be added to the Surface kernel, which will remove the need for using nosid
with that kernel at least.
I think a proper fix will involve adding support to the kernel for DMA aliases with different bus numbers, then I could add an alias in the ithc driver. But someone who really understands how PCI MSI and the Intel iommu work should have a look at this.
Edit: I could also add some code to detect if the irq is working, and automatically switch to polling mode if it isn't. Not optimal, but maybe the easiest fix.
29 Incorrect MSI BDF Returned By Touch Host Controller. Problem: When a hypervisor is enabled, the Message Signaled Interrupt for the Touch Host Controller (THC) returns an incorrect bus device function number.
So I guess that confirms it's a Tiger Lake hardware bug.
Closing this as the iommu patch seems to do the job.
The IOMMU gives the following error when trying to use the irq:
DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [01:05.0] fault index 0x2f [fault reason 0x26] Blocked an interrupt request due to source-id verification failure
No clue how to fix this. May be an ACPI bug?