Closed zetalog closed 1 month ago
Two stage address translation is always active in the IOMMU and there is no option to disable it. However, any stage can be effectively disabled by programming the mode for that stage as Bare
. If VS-stage is not Bare
for a transaction then the transaction carries an VA and not a GPA. The MSI detection is performed on a GPA and not an VA and the VS-stage address translation must be performed even if the VA is identity mapped (which the IOMMU cannot guess) in the VS-stage. So all of this is as per specification. Please take it up on LKML if there are questions about the Linux kernel.
The usage model is not related to the 2-stage address translation. The problem is seen in a supervisor Linux kernel where a 6.6 based IOMMU driver patchset is applied. This version of IOMMU patchset is lack of the above mentioned MSI table identity mapping: ... Same Linux kernel runs fine on Qemu.
This looks weird that in an S1 only environment, without MSI PT feature, only S1 PT translation should be required for MSI writes. While with MSI PT feature, not only S1 PT translation but also MSI PT translation is required for MSI writes.
This looks weird that in an S1 only environment, without MSI PT feature, only S1 PT translation should be required for MSI writes. While with MSI PT feature, not only S1 PT translation but also MSI PT translation is required for MSI writes.
If software has configured the IOMMU to do VS-stage address translations then the IOMMU does VS-stage address translations. If further software has configured MSI page tables to translate MSIs in Basic or MRIF mode then the IOMMU translates further using the MSI page table.
Yes, OSen like Linux only provides MSI specific operations in the vfiommu framework and the feature looks tightly related to the S2. But let me try to express the data flow, and please correct me if they are wrong.
My understanding is:
Either way, the MSI PT acts like an ITS (ARM world feature) filtering MSI writes from the DMA transactions. Among the above scenarios, which one is the better practice in OSen?
The QEMU model allows both usage models, RIVOS model only allows the 2nd usage model. Is there any concern to restrict MSI PT for being used only in vfiommu framework?
NOTE that in an SoC design, devices including MSI write addresses should be designed to be resident in the higher physical address space while DMA remapped zone should be resident in the lower virtual address space. Such detection should be safe without worrying about the conflict of DMA transactions and MSI writes.
Please see IOMMU specification section "Process to translate IOVA".
iosatp.mode
is Bare
then IOVA is same as GPA else IOVA is as determined by walking the VS-stage page tables. I briefly looked through the QEMU patches and see somewhere along the line it dropped en_s
from the patch and this change makes it not compliant with the RISC-V IOMMU specification since when VS stage address translation is enabled whether the IOVA is identity mapped or not cannot be inferred i.e. IOVA may not be same as GPA when iosatp.MODE != Bare.
/* Early check for MSI address match when IOVA == GPA */
- if (!en_s && (iotlb->perm & IOMMU_WO) &&
+ if ((iotlb->perm & IOMMU_WO) &&
The GPA corresponding to the virtual IMSIC is mapped into the guest. It is also mapped into the virtual address space of the guest OS since that mapping is required for the OS to do IPIs.
The QEMU model allows both usage models, RIVOS model only allows the 2nd usage model.
What is RIVOS model?
Please ask if there are further questions/comments.
In iommu_translate.c, the reference model is implemented in this way:
That means for an S1 only translation, an address will be translated prior than to be detected as MSI.
While in the specification, MSI configurations are resident in the DC, which is independent of S1. Also we can see a different model in qemu: https://gitlab.com/danielhb/qemu/-/blob/riscv_iommu_v5_rc1/hw/riscv/riscv-iommu.c?ref_type=heads Which detects MSI prior than performing S1:
The wrong reference model requires a special quirk to be introduced in the recent Linux kernel IOMMU driver, which requires a VA=PA mapping to be created for the MSI table. And now this is known to be the significant difference between Linux 6.6 and Linux 6.10.