pcengines / apu2-documentation

Documentation and scripts for building and adjusting PC Engines APU2 firmware
https://pcengines.github.io/apu2-documentation/
208 stars 45 forks source link

No GSI for IOMMU device #241

Open thillux opened 5 years ago

thillux commented 5 years ago

The APU2 IOMMU device gets no GSI under Linux.

[ 1.694982] pci 0000:00:00.2: PCI INT A: no GSI

Tested on Arch Linux

Linux mbdf 5.0.7-arch1-1-ARCH pcengines/coreboot#1 SMP PREEMPT Mon Apr 8 10:37:08 UTC 2019 x86_64 GNU/Linux

thillux commented 5 years ago

I fiddled around against coreboot master. A patch solving this issue may looks like this: Patch against coreboot master

pietrushnic commented 5 years ago

@thillux thank you for your contribution. Would you mind to submit pull request/send patches? Otherwise, we can commit on your behalf, if you don't mind.

thillux commented 5 years ago

I can send you a pull request. Which branch should I base my code on?

pietrushnic commented 5 years ago

@miczyg1 please advise, but I assume recent develop would be fine.

miczyg1 commented 5 years ago

Indeed, develop would be the best

thillux commented 5 years ago

Update: I've found some bugs in my code and I am still working on it in my spare time. The bug message in kernel log already disappears, if

    /* Bus 0, Dev 0 - F15 Host Controller */
    Package(){0x0000FFFF, 0, 0, 28 },

is introduced. The other code lines in my patch are still under my review, as some of them seem to be incorrect or incomplete.

Questions:

miczyg1 commented 5 years ago

@thillux could You set up a pull request? Just mark it with a [WIP] to indicate Your pending work on it. I would like to see the whole patch. Would be also great if You could provide the kernel log, version etc. AFAIU the 28 is the index number for the interrupt register as described in the IO 0xC00 register bits [6:0]. Unfortunately it does not map anywhere, just to 0x1F-0x1A reserved field (28 = 0x1C).

Description available in BKDG pages 680-683.

Regarding the values defined in mainboard.c these are used with IO 0xc00 and IO 0xc01 registers to program the interrupt router as defined in BKDG pages 680-683 and they are not used for early setup. mainboard.c is compiled into ramstage, which is mid-late boot stage. The interrupt programming for PCI devices is executed after PCI enumeration and resource assignment.

By DSDT parsing do You mean the operating system kernel that parses ACPI tables? What do You mean by cleared out again?

thillux commented 5 years ago

Started work on pcengines/coreboot#292

thillux commented 5 years ago

dmesg output with pcengines/coreboot#292 dmesg output with 4.9.0.4

[ 1.691535] pci 0000:00:00.2: can't derive routing for PCI INT A [ 1.691542] pci 0000:00:00.2: PCI INT A: no GSI

thillux commented 5 years ago

Without interrupt routing over ACPI (boot kernel with acpi=noirq as parameter) there are many other routing issues besides pci 0000:00:00.2. dmesg_4.9.0.4_noirq.txt

Besides

[ 1.704636] pci 0000:00:00.2: can't find IRQ for PCI INT A; probably buggy MP table

there are also:

[ 1.229480] pci 0000:00:10.0: can't find IRQ for PCI INT A; probably buggy MP table [ 2.551039] xhci_hcd 0000:00:10.0: can't find IRQ for PCI INT A; probably buggy MP table [ 2.561307] sdhci-pci 0000:00:14.7: can't find IRQ for PCI INT A; probably buggy MP table

These other bugs should probably be squashed first, before working again on pcengines/coreboot#292.

miczyg1 commented 5 years ago

@thillux thanks for Your effort. Regarding the other bugs in interrupt routing, I see inconsistency in MP table creation. I may setup a PR quickly with fixes, so You could test it. Your Arch Linux is crafted or rather a generic installation? I would like to test it myself too.

thillux commented 5 years ago

On this test box, I use a generic Arch Linux without kernel modifications.

miczyg1 commented 5 years ago

I have done some research on IOMMU PCI interrupts and I have following conclusions:

Regarding these:

[ 1.229480] pci 0000:00:10.0: can't find IRQ for PCI INT A; probably buggy MP table
[ 2.551039] xhci_hcd 0000:00:10.0: can't find IRQ for PCI INT A; probably buggy MP table
[ 2.561307] sdhci-pci 0000:00:14.7: can't find IRQ for PCI INT A; probably buggy MP table

I see the problem and will apply a fix. However these warnings should be ignored. Investigating kernel source code confirms information in specs. Kernel should enable MSI, but the fact that IOMMu is a PCI device, the kernel's PCI generic init searches for INTx configuration, printing error that no INT for the device. Additionally having a look at lspci verbose output of IOMMU device:

00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1567]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1567]
    Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 0
    Capabilities: [40] Secure device <?>
    Capabilities: [64] MSI: Enable+ Count=1/4 Maskable- 64bit+
        Address: 00000000fee0f00c  Data: 4161
    Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+

MSI is enabled Capabilities: [64] MSI: Enable+ and INTx is disabled: INTx- in Status

thillux commented 5 years ago

Ok, thanks a lot! :thumbsup:

miczyg1 commented 5 years ago

@thillux I agree this warning is confusing, we will try to send a patch to Linux kernel then. I think the kernel should not look for legacy interrupt routing, but use MSI instead.

miczyg1 commented 4 years ago

A small update: ACPI tables, as well as PCI registers of IOMMU, should point to the MSI number which is used by IOMMU to signal interrupts. However, for an unknown reason, it is set to 0 by AGESA (AMD proprietary processor initialization code blob) which may result in such behavior. Unfortunately, it cannot be changed just like that, because the IOMMU PCI configuration is being locked by AGESA. I will try to tweak some bits to see whether I can do something about it.

thillux commented 4 years ago

Great, just let me know if I should test some changes.

miczyg1 commented 4 years ago

@thillux thank you for the support. I may provide a binary for testing which might help.

BTW: Have you encountered similar issues like this? https://github.com/pcengines/coreboot/issues/285

thillux commented 4 years ago

I answered on pcengines/apu2-documentation#240. If I remember correctly, this messages originates from interrupt remapping not possible with legacy IRQs of WLE200NX (no MSI kernel module parameter used). IRQs then trigger memory accesses on unmapped areas (from an IOMMU perspective).