riscv-non-isa / riscv-iommu

RISC-V IOMMU Specification
https://jira.riscv.org/browse/RVG-55
Creative Commons Attribution 4.0 International
85 stars 17 forks source link

Incompatible "FAULT_TYPE = UR" reasons in ATS is observed #327

Closed zetalog closed 4 months ago

zetalog commented 4 months ago

Another ATS compatibility issue is observed for the DTI-ATS "FAULT_TYPE" field:

DTI-ATS fault type

FAULT_TYPE: 00 InvalidTranslation 01 CompleterAbort 10 UnsupportedRequest 11 Reserved ... When the value of this field is CompleterAbort, this field indicates that there was an error during the translation process. The DTI master returns a Translation Completion message with the status value as CompleterAbort (CA). When the value of this field is UnsupportedRequest, this field indicates that ATS is disabled for this or all StreamIDs. The DTI master returns a Translation Completion message with a status value as UnsupportedRequest (UR).

SMMUv3 F_BAD_ATS_TREQ

And DTI-ATS follows ARM SMMUv3 specification, for UR FAULT_TYPE, the specification mentions it is related to the F_BAD_ATS_TREQ, which denotes:

Reported in response to an ATS Translation Request in any of the following conditions:

RISCV Unsupported Request (UR) causes

The incompatibility is then observed for the DTI-ATS checker developed based on the ARM DTI-ATS behavior for the FAULT_TYPE field as RISCV IOMMU specification mentions following UR result:

If there is a permanent error or if ATS transactions are disabled then an Unsupported Request (UR) response is generated. The following cause codes belong to this category: • All inbound transactions disallowed (cause = 256) • DDT entry load access fault (cause = 257)DDT entry not valid (cause = 258)DDT entry misconfigured (cause = 259) • Transaction type disallowed (cause = 260)

Incompatibility observed

In DTI-ATS checker, 257/258/259 fault reasons will lead to CA rather than UR. The rule is:

  1. IOMMU.mode=off, ddt.v=1 and EN_ATS=0 result in FAULT_TYPE=2;
  2. Implicit S2 fault results in FAULT_TYPE=0;
  3. all others result in FAULT_TYPE=1.

RISCV IOMMU specification 2.1.3. also mentions UR is related to EN_ATS=0 which is DTI-ATS compliant. This actually can be interpreted as a specification ambiguity. This ambiguity leads to the following confusing hardware behaviors, affecting the software programming models:

  1. ddt access, ddt misconfigured result in FAULT_TYPE=2 when protocol is ATS but FAULT_TYPE=1 when protocol is non-ATS;
  2. ddt access, ddt misconfigured may result in FAULT_TYPE=2 while pdt access, pdt misconfigured always result in FAULT_TYPE=1;
  3. ddt.V=0 may result in FAULT_TYPE=2 while pdt.V=0 always result in FAULT_TYPE=0.

As such, IMO, we may solve this specification ambiguity due to the following reasons:

  1. to be compatible with the de-facto standard behaviors which has been adopted by the eco-system PCIe IPs;
  2. to have unified programming model with ddt.V=0, ddt access, ddt misconfigured fault handling.
  3. A more interesting programming model can be used when we have this changed: software can switch V=1/0 temporarily to lock a device configuration to flush the transactions related to the device configurations while still be able to have those transactions retried by the PCIe master side.
ved-rivos commented 4 months ago

In DTI-ATS checker, 257/258/259 fault reasons will lead to CA rather than UR. The rule is:

A checker written to this IOMMU specification should look for a UR response. Using a checker for a non RISC-V IOMMU - ARM/IBM/Intel/AMD/etc. IOMMU - with a RISC-V IOMMU will likely lead to unexpected results.

zetalog commented 4 months ago

OK, so if you are sure the IOMMU fault type definitions are stable and won't affect the applicable ecosystem, we'll follow. Thanks for the response.