D-bit description is wrong in the last version of specification

zetalog commented 9 months ago

2.4. IOMMU updating of PTE accessed (A) and dirty (D) updates When capabilities.AMO_HWAD is 1, the IOMMU supports updating the A and D bits in PTEs atomically. When updating of A and D bits in second-stage PTEs is enabled (DC.tc.GADE=1) and/or updating of A and D bits in first-stage PTEs is enabled (DC.tc.SADE=1) the following rules apply:

The A and/or D bit updates by the IOMMU must follow the rules specified by the Privileged specification for validity, permission checking, and atomicity.
The PTE update must be globally visible before a memory access using the translated address provided by the IOMMU becomes globally visible. Specifically, when a translated address is provided to a device in an ATS Translation completion, the PTE update must be globally visible before a memory access from the device using the translated address becomes globally visible.

The A and D bits are never cleared by the IOMMU. If the supervisor software does not rely on accessed and/or dirty bits, e.g. if it does not swap memory pages to secondary storage or if the pages are being used to map I/O space, it should set them to 1 in the PTE to improve performance.

According to this description, D-bit handling should following privileged specification, which should be wrong as IOMMU plays a different role than CPU in A/D handling.

A/D bits are actually managed by software indicating dirty disk caches (buffer in OS context) should be written-back to the disk when necessary. And IOMMU plays after such determination. Since the scenario of AD-handling is totally different between CPU and DMA, care should be taken in handling D-bit.

zetalog commented 9 months ago

Software could implement D-bit by marking a page as RO, then when a write triggers a fault, software can mark this page as dirty. Hardware accelerates this process by introducing the D-bit in RISC-V privileged spec, when it is not set, the page is RO and a write automatically faults when the D-bit is 0 (Svade) or updates the D-bit to 1 (Svadu). Thus, when AMO_HWAD is not implemented (supposing this means Svade), implementations may follow the IOMMU specification to raise a fault when D-bit is 0, while this is wrong.

For managed DMA transactions (when OS should have already determined the A/D-bits value prior then trigger the DMA transaction), what is supposed to be done by the software when IOMMU triggers such a fault:

Setting D=1, this obviously is wrong since this would trigger a write-back of disk cache when they are clean (as they were about to be transferred from disk back to memory)
Ignore the spurious IRQ and stay doing nothing, this should also be wrong as IOMMU hardware will trigger the fault again and again and the DMA transfer couldn't be completed.

IMO, the only correct way is not to trigger such kind of fault when D=0 and AMO_HWAD=0. Thus the behavior is different than what the privileged spec defined.

Similarly, other A/D bits behavior should also be detailed again since the scenario in A/D handling is totally different between CPU (accelerate OS dirty page write-back) and DMA (writing-back, warm-up disk caches).

zetalog commented 9 months ago

Linking to the original discussion: https://github.com/riscv-non-isa/riscv-iommu/issues/173

ved-rivos commented 9 months ago

When a buffer is mapped for DMA, for the device to write, the buffer should be considered dirty as that buffer will be written to at some point in future, till the driver releases such buffer. Such buffers should be considered dirty buffers as they are written to by the device. As to when that write occurs depends on the device and its driver. A IOMMU translation fault is fatal. If a write encouters a fault the write transaction is discarded and a fault reported to the IOMMU driver. If a read encounters a fault then the device is provided a UR/CA response and the fault reported to the IOMMU driver. In case of the reads getting a completer abort or a unsupported request reponse the device will likely invoke its driver for error handling. For DMA faults there is no recovery/retry - the writes are posted and discarded. The only recourse after causing such a fault is to reset and reinitialize the device back to an operational state.

riscv-non-isa / riscv-iommu

D-bit description is wrong in the last version of specification #282