riscv-non-isa / riscv-iommu

RISC-V IOMMU Specification
https://jira.riscv.org/browse/RVG-55
Creative Commons Attribution 4.0 International
91 stars 17 forks source link

Should the IOMMU stall after the IOMMU fault queue overflows? #438

Open xuchengbei opened 3 days ago

xuchengbei commented 3 days ago

I am a little confused about the behavior of IOMMU when fault queue is full:

Thanks for your reply in advance.

ved-rivos commented 3 days ago

Stalling is not an option for the IOMMU. Especially, with protocols like PCIe, stalling will lead to deadlocks. For instance, consider the following scenario:

  1. Software did a load from the device MMIO register
  2. At same time device does a DMA write
  3. Device sends the completion for the load
  4. DMA write gets a fault and is stalled. Since completion cannot pass write, completion is also stalled
  5. IOMMU sends interrupt to the hart to indicate the fault. Now software cannot get this fault interrupt since its load from step 1 is waiting for completion. This causes a deadlock since software cannot fix the fault and resume till the load completes but the load cannot complete till the stalled write is resumed.

A memory read or write from a device may hit a fault condition. When this happens, two things are triggered:

  1. An error response is sent back to the device. This varies depending on the IO protocol. For PCIe, this could be a UR or CA response; for AXI, a SLVERR response.
  2. A fault record is generated and an interrupt is fired off to the IOMMU driver.

Device-specific behavior in response to this error response varies. A NIC might drop the packet, while an NVMe controller might mark the command as failed. In more severe cases, like a DMA read failure of a command descriptor, the device might become non-functional.

The fault report sent to the IOMMU driver is essentially a post-mortem; the faulting transaction itself can't be undone. The device has already been notified via the error response.

To support page faulting PCIe SIG defines ATS and Page Request mechanism. ATS does not require a suspend mode of operation as it can signal the translation fault back to the device and the device can then request the page to be made resident using a Page request. For such recoverable faults, there are no fault records created. So devices intending to use recoverable faults should be using protocols like ATS and PRI.

For non-ATS/PRI capable devices, Faults generally point to issues in the device driver or the device itself. Normally, the device driver should use the OS DMA APIs to ensure that memory addresses submitted to the device are both resident (pinned) and DMA-accessible. A fault triggered by absent pages in the page tables or insufficient permissions likely indicates a bug in this process. Similarly, if the device is buggy or misbehaving, unauthorized memory access attempts could also trigger faults. The IOMMU fault report serves as a diagnostic tool for identifying such issues.

Software is expected to provision a fault queue that is sufficiently large to not cause fault queue overflows. The fault queue overflow also does not stop the error response from being provided to the device. If a fault queue does overflow then software may need to assume that any device in the scope of that IOMMU has encountered a fault and may need to take more drastic recovery actions such as issuing a function level reset to reset and recover devices back to functional state.