riscv-non-isa / riscv-iommu

RISC-V IOMMU Specification
https://jira.riscv.org/browse/RVG-55
Creative Commons Attribution 4.0 International
85 stars 17 forks source link

Missing DMA_MASK in IOMMU specification leaves application problems in the real operating system #391

Closed zetalog closed 2 months ago

zetalog commented 2 months ago

In the modern operating systems, DMA addresses are not just physical addresses or virtual addresses, please refer to: https://github.com/torvalds/linux/blob/master/Documentation/core-api/dma-api-howto.rst Operating systems treat DMA addresses as "bus addresses", which are likely referring to a subset of a physical address space or a subset of a virtual address space. In the systems like Linux, there is a DMA_MASK attributes system widely applied or device specific. When a virtual/physical address is to be converted into a DMA address, operating system simply masks off the the most significant bits of the DMA address, that means, the most significant bits of the DMA address are always 0s.

And there are many ecosystem DMA master IPs which support limited address width, say, 16-bits, 32-bits, or 48-bits. Such IPs do not know whether the programmed DMA addresses are virtual addresses or physical addresses. The integration guide of such silicon won't introduce a design forcing such IPs to be integrated with most significant bits wired to 1s (ex, same as bit 48).

However in IOMMU specification, it requires fault to be generated for virtual addresses whose most significant bits are not compliant to the RISC-V virtual address requirement (e.x., bit 63-48 should be same as bit 48 for Sv48, etc.). When an input address is reported as fault, iotval/iotval2 in the fault queue should be filled in with full 64-bit input address, while actually, software and hardware are only capable of recording masked DMA bus addresses.

This issue causes serious application problems in the operating system. When an IOMMU implementation is implemented to be compliant to the specification w/o DMA_MASK awareness, fault could be required to be reported for non-compliant virtual addresses whose most significant bits are not equal to bit 48. Then we can see wrong fault to be generated in Linux preventing remapped DMA transfers to be submitted from such DMA masters whose DMA_MASK is not 64-bit and wires most significant bits of its output DMA addresses to 0s. A stupid workaround could be to wire those bits to bit 48, then another compliance problem can be seen in the IOMMU fault queue, which could contain a fault report with most significant bits wired to 1s in iotval/iotval2 while iotval/iotval2 should be physical addresses whose most significant bits should be 0s.

IMHO, IOMMU specification should introduce DMA_MASK capability indicating supported input address of the IOMMU device, and a similar DMA_MASK field in the device context indicating supported DMA address width of the DMA master device. Most significant bits beyond min(capabilities.DMA_MASK, DC.DMA_MASK) are allowed to be all zeros (probably should all zeros or all ones to allow design variations). And the virtual address sanity check should be bounded into the affected DMA_MASK range, for the address bits beyond the affected DMA_MASK range, all checks should be ignored or we can just apply a simple check validating if all such bits equal to bit[DMA_MASK] like all other DMA remapping silicon design do. In fault queue, masked DMA addresses should also allowed to be recorded into iotval/iotval2 fields.

ved-rivos commented 2 months ago

The IOMMU provides the address that was submitted for address translation in the fault record. Software may apply masks as appropriate to the reported addresses .

The integration guide of such silicon won't introduce a design forcing such IPs to be integrated with most significant bits wired to 1s (ex, same as bit 48).

The addresses presented to the IOMMU are addressses generated by the IP. If the IP cannot generate an address beyond say 16-bits then the IOMMU would not see any addresses requested for translation bits beyond what the IP can generate. There is no requirement to wire most significant bits to all 1s etc.

zetalog commented 2 months ago

The addresses presented to the IOMMU are addressses generated by the IP. If the IP cannot generate an address beyond say 16-bits then the IOMMU would not see any addresses requested for translation bits beyond what the IP can generate. There is no requirement to wire most significant bits to all 1s etc.

Then we may introduce a configurable "DMA_MASK" option into the reference model, ignoring significant bits checking if it resides beyond the configured DMA_MASK so that we can also use it with real SoC verification where such DMA masters will be integrated.

ved-rivos commented 2 months ago

Then we may introduce a configurable "DMA_MASK" option into the reference model, ignoring significant bits checking if it resides beyond the configured DMA_MASK.

That would be incorrect thing to do. The IOMMU should prevent misbehaving, misprogrammed and/or malicious devices from accessing memory that they are not authorized to access. Masking the address will in best case hide the misprogramming and in worst case hide the malicious behavior or misbehavior.

zetalog commented 2 months ago

IOMMU silicon developed by forcing the following code observes wrong fault in Linux, preventing such DMA masters to be used:

    // 1. Let a be satp.ppn × PAGESIZE, and let i = LEVELS − 1. PAGESIZE is 2^12. (For Sv32,
    //    LEVELS=2, For Sv39 LEVELS=3, For Sv48 LEVELS=4, For Sv57 LEVELS=5.) The satp register
    //    must be active, i.e., the effective privilege mode must be S-mode or U-mode.
    if ( iosatp.MODE == IOSATP_Sv32 && SXL == 1 ) {
       ...
        // When `SXL` is 1, the following rules apply:
        // * If the S/VS-stage page table is not `Bare` then a page fault corresponding to
        //   the original access type occurs if the `IOVA` has bits set beyond bit 31.
        mask = (1UL << (64 - 32)) - 1;
        masked_upper_bits = (iova >> 32) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv39 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 38)) - 1;
        masked_upper_bits = (iova >> 38) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv48 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 47)) - 1;
        masked_upper_bits = (iova >> 47) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv57 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 56)) - 1;
        masked_upper_bits = (iova >> 56) & mask;
    }
    // Instruction fetch addresses and load and store effective addresses,
    // which are 64 bits, must have bits 63:<VASIZE> all equal to bit
    // (VASIZE-1), or else a page-fault exception will occur - for SXL=0
    // Do the address is canonical check - for SXL=0
    // For SXL = 1 check bits 63:32 are all 0
    if ( (masked_upper_bits != 0 && masked_upper_bits != mask && SXL == 0) ||
         (masked_upper_bits != 0 && SXL == 1) ) goto page_fault;

IMO, we should alter this piece a code a bit to allow an SoC specific DMA_MASK to be configured to prevent wrong fault to be generated for the DMA masters whose address width is less than 39, or less than 48.

This is useful when a verification TB is created with IOMMU + DMAC (address width is less than 39/48) + reference model.

ved-rivos commented 2 months ago

IMO, we should alter this piece a code a bit to allow an SoC specific DMA_MASK to be configured to prevent wrong fault to be generated for the DMA masters whose address width is less than 39, or less than 48.

Consider that a DMA master can generate only generate addresses less than 48 bit, say 16-bit wide addresses. Such a DMA master can only generate addresses 0x000000000000 through 0x00000000FFFF legally. When Sv48 is in use, all of these addresses are legal addresses. The range of legal addresses for Sv48 are 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF and 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF. Could you please give an example of such an address?

zetalog commented 2 months ago

Real issue is seen against a DMAC in an SoC design, whose master can only support maximum 44-bits, DMAC designer then wires higher bits of the DMA addresses to 0s.

When we use such DMAC in Linux kernel with IOMMU remapping enabled, following kernel panic can be seen due to IOMMU VA checking logic, preventing a valid DMA transfer to be performed:

image

zetalog commented 2 months ago

OK, I see. You mean 0x7fff_ffff_x000 shouldn't be a valid address for such kind of DMA device due to its driver doesn't invoke dma_set_mask(44), as such, there is no worry about seeing an input address between 0xFFFF'8000'0000'0000 and 0xFFFF'FFFF'FFFF'FFFF. Then as a conclusion, when such an test bench is created, input address stimuli should be controlled in the TB side rather than in the IOMMU reference model.

zetalog commented 2 months ago

One more concern is related to the IOMMU device capability. What if the IOMMU only accepts 44-bits input address, and the DMA masters are configured to use maximum 44-bits input address as well. In such a case, shall we introduce such kind of capability in the IOMMU reference model?

ved-rivos commented 2 months ago

In the example you posted, software probably created DMA mappings for:

It then programmed the device to copy from one page to another. The device can only generate 44 bit address but it has been asked to DMA to a wider address. The device ends up dropping those upper bits and instead DMAs to the following addresses:

And this leads to a fault - as expected since the DMA is to a address. From the print it seems like its a G-stage fault?

This clearly is a programming mistake. A device that can generate only 44-bit addresses should not be given an address wider than 2^44. Further in this case its unclear what the mask in the IOMMU can do - the address has already been masked by the device.

One more concern is related to the IOMMU device capability. What if the IOMMU only accepts 44-bits input address, and the DMA masters are configured to use maximum 44-bits input address as well.

The IOMMU either accepts 64-bit addresses (RV64) or accepts 32-bit addresses (RV32). There is no specification for a IOMMU with 44-bit input address.

zetalog commented 2 months ago

OK, this makes root cause clearer. Thanks and best regards.