Closed xuchengbei closed 1 month ago
Stalling is not an option for the IOMMU. Especially, with protocols like PCIe, stalling will lead to deadlocks. For instance, consider the following scenario:
A memory read or write from a device may hit a fault condition. When this happens, two things are triggered:
Device-specific behavior in response to this error response varies. A NIC might drop the packet, while an NVMe controller might mark the command as failed. In more severe cases, like a DMA read failure of a command descriptor, the device might become non-functional.
The fault report sent to the IOMMU driver is essentially a post-mortem; the faulting transaction itself can't be undone. The device has already been notified via the error response.
To support page faulting PCIe SIG defines ATS and Page Request mechanism. ATS does not require a suspend mode of operation as it can signal the translation fault back to the device and the device can then request the page to be made resident using a Page request. For such recoverable faults, there are no fault records created. So devices intending to use recoverable faults should be using protocols like ATS and PRI.
For non-ATS/PRI capable devices, Faults generally point to issues in the device driver or the device itself. Normally, the device driver should use the OS DMA APIs to ensure that memory addresses submitted to the device are both resident (pinned) and DMA-accessible. A fault triggered by absent pages in the page tables or insufficient permissions likely indicates a bug in this process. Similarly, if the device is buggy or misbehaving, unauthorized memory access attempts could also trigger faults. The IOMMU fault report serves as a diagnostic tool for identifying such issues.
Software is expected to provision a fault queue that is sufficiently large to not cause fault queue overflows. The fault queue overflow also does not stop the error response from being provided to the device. If a fault queue does overflow then software may need to assume that any device in the scope of that IOMMU has encountered a fault and may need to take more drastic recovery actions such as issuing a function level reset to reset and recover devices back to functional state.
Please ask if there are further questions.
I am a little confused about the behavior of IOMMU when fault queue is full:
Should it stall to avoid loss of new fault? In other words, should it stall all devices' memory access request to avoid generating new fault record?
For the first answer, I believe the answer is no. But let's assume there are two devices A and B whose device contexts or IO page tables are mis-configured and will definitely generate fault record whenever they access memory. Then how can the IOMMU hardware or IOMMU driver handle the case in which device A generates multiple fault records and overflows the IOMMU fault queue, and then device B begins to access memory whereas all its access faults will be discarded. Is there a better way to handle this case rather than directly discarding the fault record of device B?
Thanks for your reply in advance.