riscv-non-isa / riscv-iommu

RISC-V IOMMU Specification
https://jira.riscv.org/browse/RVG-55
Creative Commons Attribution 4.0 International
72 stars 14 forks source link

Are IOMMU reads of the DDT/PDT cache coherent? #354

Closed andreiw closed 2 weeks ago

andreiw commented 3 weeks ago

Reading section 2.8 and 2.9, I understand the role of the IOATC. What isn't clear is how the IOATC is refilled after processing the invalidation commands.

Is the IOMMU cache coherent with respect to the harts? E.g. Does a clean operation (CMO) need to be performed by a hart after manipulating the DDT and before the IODIR.INVAL_DDT?

Or is it assumed that this isn't needed because the IOMMU accesses can hit system level cache?

ved-rivos commented 3 weeks ago

The IOMMU is required to support the placement of in-memory data structures in main memory. So use of a clean operation is not required. The IOTINVAL and IODIR commands are sufficient to ensure that previous stores made by a RISC-V hart are observed before subsequent implicit accesses by the IOMMU to the corresponding data structures.

andreiw commented 3 weeks ago

So the IOMMU's view of memory is guaranteed to be always consistent with the hart/system caches and it is never a valid implementation to have an IOMMU that is not? I feel like this is something worth pointing out.

If someone puts the data structures in RAM using NC or IO mappings, I'm assuming system software would be expected to ensure the CPU cache is invalidated for the affected ranges?

ved-rivos commented 3 weeks ago

The section 6.3 and its subsections specifiy this: This section provides guidelines to software on the invalidation commands to send to the IOMMU through the CQ when modifying the IOMMU in-memory data structures. Software must perform the invalidation after the update is globally visible. The ordering on stores provided by FENCE instructions and the acquire/ release bits on atomic instructions also orders the data structure updates associated with those stores as observed by IOMMU.

When mapping as NC then the requirement to make the update globally observed follows.

And section 3.1.1 IOTINVAL.VMA ensures that previous stores made to the first-stage page tables by the harts are observed by the IOMMU before all subsequent implicit reads from IOMMU to the corresponding first-stage page tables.

The I/O regions are also coherent. However, an IOMMU is not required to support all width and burst size restrictions that an IO region may impose. The host bridge is required to enforce the PMA to ensure that if the IOMMU makes an access to an IO region to access its in-memory data structures as specified in section 1.3 for placement and data flows.

For IO regions and for NC a cache flush is not required. A cache flush would have been required if the IOMMU violated a PMA - i.e. for a otherwise hardware enforced coherent region, the IOMMU made the access bypassing the hardware coherency controller. This however is not a choice of the IOMMU at all. The IOMMU specification requires the PMA to be enforced by the host bridge even for all implicit accesses made by the IOMMU to its in-memory data structures.

ved-rivos commented 3 weeks ago

Perhaps this question arises from other architectures that have allowed the IOMMU to violate the PMA - for instance, SMMU_IDR0.COHACC. The RISC-V IOMMU specification does not provide the optionality of violating the coherence PMA.

ved-rivos commented 2 weeks ago

Hope that addressed the question.