Should the IOMMU stall after the IOMMU fault queue overflows?

I am a little confused about the behavior of IOMMU when fault queue is full:

Should it stall to avoid loss of new fault? In other words, should it stall all devices' memory access request to avoid generating new fault record?
For the first answer, I believe the answer is no. But let's assume there are two devices A and B whose device contexts or IO page tables are mis-configured and will definitely generate fault record whenever they access memory. Then how can the IOMMU hardware or IOMMU driver handle the case in which device A generates multiple fault records and overflows the IOMMU fault queue, and then device B begins to access memory whereas all its access faults will be discarded. Is there a better way to handle this case rather than directly discarding the fault record of device B?

Thanks for your reply in advance.

Stalling is not an option for the IOMMU. Especially, with protocols like PCIe, stalling will lead to deadlocks. For instance, consider the following scenario:

Software did a load from the device MMIO register
At same time device does a DMA write
Device sends the completion for the load
DMA write gets a fault and is stalled. Since completion cannot pass write, completion is also stalled
IOMMU sends interrupt to the hart to indicate the fault. Now software cannot get this fault interrupt since its load from step 1 is waiting for completion. This causes a deadlock since software cannot fix the fault and resume till the load completes but the load cannot complete till the stalled write is resumed.

A memory read or write from a device may hit a fault condition. When this happens, two things are triggered:

An error response is sent back to the device. This varies depending on the IO protocol. For PCIe, this could be a UR or CA response; for AXI, a SLVERR response.
A fault record is generated and an interrupt is fired off to the IOMMU driver.

Device-specific behavior in response to this error response varies. A NIC might drop the packet, while an NVMe controller might mark the command as failed. In more severe cases, like a DMA read failure of a command descriptor, the device might become non-functional.

The fault report sent to the IOMMU driver is essentially a post-mortem; the faulting transaction itself can't be undone. The device has already been notified via the error response.

To support page faulting PCIe SIG defines ATS and Page Request mechanism. ATS does not require a suspend mode of operation as it can signal the translation fault back to the device and the device can then request the page to be made resident using a Page request. For such recoverable faults, there are no fault records created. So devices intending to use recoverable faults should be using protocols like ATS and PRI.

For non-ATS/PRI capable devices, Faults generally point to issues in the device driver or the device itself. Normally, the device driver should use the OS DMA APIs to ensure that memory addresses submitted to the device are both resident (pinned) and DMA-accessible. A fault triggered by absent pages in the page tables or insufficient permissions likely indicates a bug in this process. Similarly, if the device is buggy or misbehaving, unauthorized memory access attempts could also trigger faults. The IOMMU fault report serves as a diagnostic tool for identifying such issues.

Software is expected to provision a fault queue that is sufficiently large to not cause fault queue overflows. The fault queue overflow also does not stop the error response from being provided to the device. If a fault queue does overflow then software may need to assume that any device in the scope of that IOMMU has encountered a fault and may need to take more drastic recovery actions such as issuing a function level reset to reset and recover devices back to functional state.

riscv-non-isa / riscv-iommu

Should the IOMMU stall after the IOMMU fault queue overflows? #438