Fix support for mixed endianness and for RV32 page tables

riscv-non-isa / riscv-iommu

RISC-V IOMMU Specification

https://jira.riscv.org/browse/RVG-55

Creative Commons Attribution 4.0 International

84 stars 16 forks source link

Fix support for mixed endianness and for RV32 page tables #52

Closed jhauser-us closed 1 year ago

jhauser-us commented 1 year ago

The RISC-V Privileged ISA allows a hypervisor or a host OS to have one endianness (little-endian or big-endian) while each guest VM may have the same or opposite endianness. The current IOMMU spec doesn't support this freedom, because it requires all page tables (G-stage, S-stage, VS-stage) to have the same endianness, configured in fctrl.

The Privileged ISA also allows a hypervisor or a host OS to be RV32, with RV32 guest VMs, or to be RV64 with guest VMs of potentially both persuasions, RV32 and RV64. While the current IOMMU spec apparently intends to allow support for RV32 page tables (Sv32 and Sv32x4 formats), it does so by expecting software to set the MODE field in an iohgatp, iosatp, or iovsatp to the proper value for Sv32 or Sv32x4. (Search for "Sv32", for example, in Section 2.4.1, "Process to locate the Device-context", or in Section 2.4.2, "Process to locate the Process-context".) However, RV64 CSRs satp and hgatp don't have a defined value of MODE that selects Sv32 or Sv32x4; these formats can be enabled only when XLEN = 32. Consequently, by oversight, the current IOMMU spec appears to support only the RV64 page-table formats, not RV32's Sv32 and Sv32x4.

In addition, there are plans for RISC-V to optionally support the abilities of Svpbmt in Sv32 and Sv32x4 page tables, but that will require a new RV32 page-table mode be enabled somewhere in a hart, again exceeding what can be configured by the RV64's MODE field for page tables.

Attempting to take everything into account to properly support

all endianness combinations and
RV32 page tables (including possible PBMT-enhanced RV32 page tables in the future),

I recommend defining these additional 1-bit fields in the tc doubleword of a device context structure:

GPTXL   G-stage page table XLEN
SPTXL   S/VS-stage page table XLEN
GPTBE   G-stage page table endianness
SPTBE   S/VS-stage page table endianness

I suggest bits GPTXL and SPTXL be encoded so 0 = RV64 and 1 = RV32. While this is different than the RISC-V ISA's usual 2-bit encoding for XLEN, I expect it will be more acceptable to treat RV64 as the IOMMU's default choice.

When GPTXL is configured for RV32, the 4-bit MODE field of an iohgatp is different than the usual 1-bit MODE field of CSR hgatp. My suggestion is to encode this 4-bit MODE so 0 = Bare as expected, and 8 = Sv32x4. The encoding 9 can be explicitly reserved for a PBMT-enhanced Sv32x4 format, should that become standardized in the future.

Likewise for SPTXL: when configured for RV32, the 4-bit MODE field of an iosatp or iovsatp can be encoded so 0 = Bare and 8 = Sv32, with 9 reserved for a possible PBMT-enhanced Sv32.

Other than this change in the interpretation of fsc.MODE for RV32 page tables, I believe no modifications are required for PDTs.

As usual, software must configure the combination of the GPTXL/SPTXL bits and corresponding MODE fields to choose only page-table formats that the IOMMU implementation supports.

The endianness bits, GPTBE and SPTBE, should be encoded following the usual RISC-V conventions: 0 = little-endian, 1 = big-endian. If an IOMMU supports only one endianness, then software must always set these bits to correspond with the IOMMU's capabilities.

ved-rivos commented 1 year ago

The dynamic endian and mixed endian was discussed and was not pursued as it places much burdens on the IO subsystem. Its not just the page tables that need to support mixed endian but the data that is being DMA itself that would need to be mixed endian. The choice was made to support a static choice for a platform through fctrl but not a per device choice. The expectation is that VMs that have a endian configuration that is different than the hypervisor will not need direct device assignment as the hypervisor will need to marshal IO data itself for such VMs. So such VMs would be using paravirtualized IO devices and not passthrough devices. The XLEN of the CPU may not be a factor for IO width. DMA may occur in sizes much different than the XLEN of the hart itself and the IOMMU has no registers. Further the IOMMU itself is not part of the datapath for device memory access to/from host memory. The intent was to support support Sv32 and Sv32-4 page table formats but it was not intention to mirror the precisely mirror the RV32/RV64 CSRs on the hart because this is a field in memory, is always 8 bytes wide, and fields like GSCID and PSCID field do not have variable positions (e.g., GSCID is located at bit position 59:44 and does not move). So the choice to keep the 1 value to denote Sv32 and Sv32x4. Since the vsatp for RV32 has only a single bit field for mode it would likely not be extended further. For example, the proposed Svpbmt extension for Sv32 under review uses henvcfg and menvcfg to reduce the addressable memory to 1G/4G and make space for PBMT bits but does not do the selection on a per page table basis.

jhauser-us commented 1 year ago

The dynamic endian and mixed endian was discussed and was not pursued as it places much burdens on the IO subsystem. Its not just the page tables that need to support mixed endian but the data that is being DMA itself that would need to be mixed endian. [...]

That's incorrect. It sounds like there's a misunderstanding about the RISC-V memory model as it relates to endianness. RISC-V specifies byte-invariance for the entire address space. The memory system is absolutely never responsible for swapping bytes to satisfy any endianness configuration. DMA has no involvement with handling endianness.

The expectation is that VMs that have a endian configuration that is different than the hypervisor will not need direct device assignment as the hypervisor will need to marshal IO data itself for such VMs. So such VMs would be using paravirtualized IO devices and not passthrough devices.

That's not how it should work. A hypervisor should not normally be rearranging bytes for a guest that's configured with a different endianness. Instead, device drivers at every level should see the hardware with the endianness it presents, for real, and the software is responsible for swapping bytes as needed.

In particular, PCIe is specified thoroughly as little-endian, and devices attached to PCIe usually all have little-endian registers too, because they're all designed to work with little-endian x86 and ARM. Unless somebody has gone to an awful lot of trouble to create alternate versions of all this hardware, big-endian RISC-V software will see all these components as little-endian, and it must swap bytes itself as needed. I cannot stress this enough: It is not the memory system's job to try to convert endianness between I/O devices and a RISC-V hart. Nor is it a hypervisor's job to do so transparently for a guest VM.

By choosing byte invariance for RISC-V (the same as adopted by ARM starting with v6), direct device assignment is in fact possible without concern for a guest OS's configured endianness. That avoids a huge performance cost, versus what you've been assuming.

The purpose of the proposed GPTBE and SPTBE bits is only to adapt to different endianness of the page tables stored in memory. They're not intended to affect anything else.

The XLEN of the CPU may not be a factor for IO width. DMA may occur in sizes much different than the XLEN of the hart itself and the IOMMU has no registers. Further the IOMMU itself is not part of the datapath for device memory access to/from host memory.

This is totally irrelevant to GPTXL and SPTXL, which only concern the page tables.

So the choice to keep the 1 value to denote Sv32 and Sv32x4. [...]

Are you saying that a 4-bit MODE value of 1 is supposed to choose Sv32 or Sv32x4? First off, the current IOMMU document does not say that at all. Second, if that's the plan, it impacts the RISC-V Privileged ISA because it makes MODE = 1 unusable for any other purpose in RV64 hgatp and satp. So you can't do that without getting approval for its impact on the privileged architecture.

For example, the proposed Svpbmt extension for Sv32 under review uses henvcfg and menvcfg to reduce the addressable memory to 1G/4G and make space for PBMT bits but does not do the selection on a per page table basis.

You're right, I overlooked that every fsc in a PDT leaf entry has a MODE field. I still think it could be done that way. But if you think it's objectionable to configure PBMT mode separately for each page table, then you're on notice that you'll probably need a way to indicate the PBMT option for RV32 page tables in each device context.

ved-rivos commented 1 year ago

So I agree that RISC-V memory model support byte-invariance. There are likely however still issues. For example, PCIe requires the RC AtomicOp completer to be aware of the endian format. The data payload in AtomicOp requests and AtomicOp completions must be formatted such that the first byte of data following the TLP header is the least significant byte of the first data value and subsequent bytes of data are strictly increasing in significance. With a compare-and-swap request, the second data value must following the first one and must be in same format. If the root complex is for example doing 8 byte FetchAdd to address 100h with target memory in little endian format, the first byte following the header is added to byte at location 100. However if the target memory is in big endian format, the first byte following the header is added to location 107h. PCIe suggests that root complexes supporting little endian processors may only support little endian format. And those serving big platforms with big endian processors may only support big endian format. For a bi-endian processor it does say a root complex may support configurable endian format. With mixed endian the root complex would need to be endian aware per transaction depending on the memory it accesses else it would not be able to interoperate with the software that uses AMO.

So you rightly point out that the specification is missing the table of mode encodings and that if RV64 redefines the value of 1 for use in RV64 then it will conflict with our choice presently of using encoding 1 to represent Sv32. We can avoid that by using the encoding value of 15 for Sv32 and Sv32x4 as that is designated for custom use by the privileged specification but is not required to be designated as such for the IOMMU as a custom implementation that needs a second custom encoding can use a bit of tc to enable a custom interpretation or extension of mode field.

Agree PBMT support for RV32 will add additional bits into tc.

ved-rivos commented 1 year ago

Suggesting following to address the issue:

Mixed endian support for IO needs more time to work through the use cases and support needed in the IO subsystem for example, support for PCIe Atomics, Shared Virtual Memory based accelerators, CXL.cache among others. Support for mixed endianess be deferred for next release of IOMMU specification.
Add the missing table of mode encodings for fsc and iohgatp. The encodings will use the value of 1 to represent a page table of Sv32/Sv32x4 format. If a future standard extension defines the mode value of 1 for RV64 then the IOMMU will need to define an equivalent encoding for that.
The Svpbmt for RV32 is undergoing definition. When the Svpbmt for RV32 gets ratified it will need to be incorporated into the IOMMU in the next release of the IOMMU specification.

ved-rivos commented 1 year ago

PR #61 to add mode encoding tables for iosatp/iovsatp and iohgatp fields.

jhauser-us commented 1 year ago

So I agree that RISC-V memory model support byte-invariance. There are likely however still issues. For example, PCIe requires the RC AtomicOp completer to be aware of the endian format. [...]

You raise an important complication I wasn't previously aware of. So...

Support for mixed endianess be deferred for next release of IOMMU specification.

I spoke earlier to some members of the Architecture Review Committee, and that seems the most likely course for now.

Add the missing table of mode encodings for fsc and iohgatp. The encodings will use the value of 1 to represent a page table of Sv32/Sv32x4 format.

I don't know that the ARC is going to agree to that.

I think it's more likely the ARC will request to defer also RV32 support to a later version, so haggling about this doesn't delay the IOMMU for little-endian RV64.

ved-rivos commented 1 year ago

I will split this issue into two - so we can keep the mixed endian issue open. On encoding, would the use of 15 as the encoding for Sv32/Sv32x4 be more acceptable. 15 is not usable for standard extension in satp/vsatp.

jhauser-us commented 1 year ago

On encoding, would the use of 15 as the encoding for Sv32/Sv32x4 be more acceptable.

I'll ask at the next ARC meeting.

jhauser-us commented 1 year ago

So I agree that RISC-V memory model support byte-invariance. There are likely however still issues. For example, PCIe requires the RC AtomicOp completer to be aware of the endian format. [...]

Preliminary investigation by Andrew and me leads us to believe this is a non-problem for RISC-V. A search for the word endian in my older PCIe 3.0 standard shows that it is concerned about the endianness of memory only in regard to the Fetch and Add AtomicOp. It also says:

There is no PCI Express requirement that an RC AtomicOp Completer support the host processor's "native" [endian] format (if there is one), nor is there necessarily significant benefit to doing so. For example, some processors can use load-link/store-conditional or similar instruction sequences to do atomic operations in non-native endian formats and thus not need the RC AtomicOp Completer to support alternative endian formats.

The RISC-V ISA has its own "load-link/store-conditional" instructions, LR and SC, in extension A (Atomic instructions), and we believe it is reasonable to assume that RISC-V cores will implement LR and SC in machines with PCIe where this could possibly be an issue. Notably, the RISC-V Platform specification for Linux-class systems mandates LR and SC (because extension A is mandatory in the RVA profile the platform spec requires).

What this means is that the PCIe hardware can have a fixed endianness (probably little-endian), or can be configured with a single fixed endianness, big- or little-endian, and RISC-V software of the other endianness can use LR, REV8 (byte swap), and SC for performing atomic-add operations that interleave correctly with Fetch and Add AtomicOps done by PCIe devices---not unlike how the same software will use REV8 to convert endianness when interacting with PCIe devices in all other respects.

Do you see anything on this topic we may have missed?

ved-rivos commented 1 year ago

So with certain assumptions that software does not use AMO and the IO agents can participate in the LR/SC scheme this could be made to work. Please also consider the usages of shared virtual memory where accelerators and host can share pointers and can collaboratively compute on data. I personally do not have experience with building mixed endian IO systems but I will accept the conclusion that we do not need to study this any further. I drop my concern and I will generate a PR for this.

One question about the suggested change: GPTBE G-stage page table endianness

The fctl.BE can govern the G-stage page tables along with the endianess for the in-memory data structures and in-memory queues. A per device selection does not seem necessary. Please tell me if this is wrong.

ved-rivos commented 1 year ago

Perhaps for nested virtualization one may hold G-stage page tables with different endian per device.

Proposal below:

IOMMU supports 3 endianness controls.

fctl.BE - governs endianness of global data structures
DC.tc.SBE - This control applies to memory management data structures managed by S-mode software such as the hypervisor. When capabilities.BE is 0, SBE must match fctl.BE. When capabilities.BE is 1, then SBE may be set to 0 or 1. Memory access to data structures governed by SBE are little endian if SBE is 0 and big endian otherwise.
DC.tc.VSBE - This control applies to memory management data structures managed by VS-mode software such as the guest OS. When capabilities.BE is 0, the VSBE must match fctl.BE. When capabilities.BE is 1, then VSBE may be set to 0 or 1. Memory access to data structures governed by VSBE are little endian if VSBE is 0 and big endian otherwise.

Endianness of Implicit accesses to memory management data structures such as page tables is governed as follows:	Data Structure	Governed by
Command Queue	`fctl.BE`
Fault Queue	`fctl.BE`
Page Request Queue	`fctl.BE`
Device Directory Table	`fctl.BE`
PDT when G-stage not active	`DC.tc.SBE`
S-stage page table	`DC.tc.SBE`
G-stage page table	`DC.tc.SBE`
MSI page table	`DC.tc.SBE`
PDT when G-stage active	`DC.tc.VSBE`
VS-stage page table	`DC.tc.VSBE`

Device Context configuration checks extended to detect misconfiguration if

capabilities.BE is 0 and fctl.BE != DC.tc.SBE or fctl.BE != DC.tc.VSBE

Process to translate MSI updated as follows to use DC.tc.SBE instead of fctl.BE

If bit 2 of A is 1, i.e. the MSI is in big-endian byte order. The IOMMU capable of big-endian access to memory if the BE bit in the capabilities register (<>) is 1. When the IOMMU is capable of big-endian operation, DC.tc.SBE holds the configuration bit that may be set to 1 to enable big-endian access to memory. If the IOMMU is not capable or has not been configured for big-endian access to memory, then stop and report "Transaction type disallowed" (cause = 260).

Add a notes in hardware guidelines: To support mixed-endian data structure access, the IO bridge

Must implement byte-invariant addressing such that a byte access to a given address accesses the same memory location in both little-endian and big-endian mode of operation.
Must support the use of LR/SC sequence by software for atomic transformation of data shared with an IO agent that does not use same endianness for memory access as the software.

Add notes for software:

The GSCID field of iohgatp identifies an address space. Configuring identical GSCID in two DC but with different SBE may lead to the IOMMU interpreting a G-stage PTE in big endian form or in little endian form. No other behaviors are expected.
Software must use an appropriate software sequence to swap bytes as necessary to create a mutually agreed data representation when sharing data with an IO agent that does not use the same endianness for memory access as that used by software.
Software must use an LR/SC sequence with the instructions that perform the byte swapping when the data shared with the IO agent must be accessed atomically if the IO agent does not use the same endianness for memory acccess as that used by software.

jhauser-us commented 1 year ago

Perhaps for nested virtualization one may hold G-stage page tables with different endian per device.

Actually, I would say no; it's reasonable to expect that nested virtualization will require shadow page tables for G-stage translation, and the software managing them can do the necessary endianness conversion.

I think we can simplify to somewhere between your first suggestion and your latest one. I agree that a PDT would ideally be the same endianness as the S/VS-stage page tables. But I think it'll be fine for the G-stage page tables to all be governed by fctl.BE.

(The only motivation I can see for more configurable G-stage endianness would be if control over the IOMMU was being split between multiple OSes with different endianness, running on different RISC-V harts in the system. But I'm unable to convince myself that that's a realistic scenario with this IOMMU design. It would be possible with a more complex version of the IOMMU, if anybody ever wanted it.)

So I propose that a DC have a single SBE bit, which controls the endianness of the PDT, if there is one, and all S/VS-stage page tables. (Basically, everything pointed to by the fsc doubleword.)

That eliminates the need for the extra note about GSCID and endianness. Your other two notes look fine to me.

I need to review the details about MSI and little-endian, but I'll do that separately.

Earlier you wrote:

So with certain assumptions that ... IO agents can participate in the LR/SC scheme, this could be made to work

For the record, I/O agents don't need to know or do anything special. If I/O initiates an AtomicOp, the normal memory system should cancel the reservation for any concurrent LR + SC sequence to avoid error. This behavior should already be built into the system, and happens automatically. But software does need to know to swap endianness, and to use LR + SC when a byte-swapping read-modify-write must be atomic.

ved-rivos commented 1 year ago

Actually, I would say no; it's reasonable to expect that nested virtualization will require shadow page tables for G-stage translation, and the software managing them can do the necessary endianness conversion.

I was envisioning a system with a security hypervisor where a user may also install a hypervisor. The security hypervisor uses PMPs to isolate memory reserved for use by the security hypervisor.

Security hypervisor executes the user installed hypervisor in a VM without a G-stage page table active, as it employs PMPs for protection and no page-based address translation is needed. Shadowing of user hypervisors built PTs is also not needed.

The security hypervisor may create secure VMs, in the reserved memory, that are isolated with G-stage page tables. The security hypervisor may reserve some devices for use by the secure VMs.

Security hypervisor must mediate access to IOMMU from the user installed hypervisor.

Such a system may require a mix of endianness for G-stage PTs created by the user hypervisor and the security hypervisor.

jhauser-us commented 1 year ago

I was envisioning a system with a security hypervisor where a user may also install a hypervisor. The security hypervisor uses PMPs to isolate memory reserved for use by the security hypervisor.

Control over a hart's PMP table is available only to machine level (M mode), so the "security hypervisor" you describe must run at that level. Machine level also controls the endianness of S mode where an OS runs, so it always knows the endianness of the OS's page tables. While it is possible for M-level software to partly virtualize the IOMMU for S-level, I believe this will require some shadowing of DTTs. To minimize overhead, I suggest M-level software should set fctl.BE to match the S-level endianness, and have its own shadow DTTs (and G-stage page tables there, if any) use this endianness, even if different from its own. If I understand what you're describing, that byte-swapping should be insignificant extra overhead on top of the rest of it.

But if the security hypervisor runs at HS-level and wants to support nested hypervisors, then it will have no real choice but to implement shadow G-stage page tables, as I suggested before. PMP isn't available to it. Depending on the circumstances, it could choose to set fctl.BE to the endianness of the VM instead of its own endianness.

ved-rivos commented 1 year ago

It is understood that PMP are a M-mode resource.

The system I was describing was where the security hypervisor runs at HS level since it does not want to significantly expand M-mode and also wishes to use extensions available to HS mode such as the virtual memory system.

The user hypervisor, e.g, linux with a KVM driver, may or may not run its own guests. For the user hypervisor and apps hosted directly by the hypervisor kernel, this system has no nested translation overheads. For the guests that the user hypervisor hosts, there are no overheads of page table shadowing incurred either. Since for a VM, the majority of the overheads arise from memory virtualization, the user partition in this system, though is a VM, incurs no additional overheads compared to a system that did not have the security hypervisor.

For the security hypervisor to use PMP for memory protection, the M-mode has to provide some services. The traps to the security hypervisor will need to go through the M-mode monitor that will reflect them to the security hypervisor.

The mediation of the IOMMU would also be through trap-and-emulate with the traps reflected to the security hypervisor by the M-mode or through an enlightened para-virtualization interface between the user hypervisor and the security hypervisor. In either case, the M-mode does not implement the virtual IOMMU model but relays the traps.

The user hypervisor may be of same endianness as the security hypervisor sometimes and may be of a different endianness at other times. The security hypervisor may not wish to change its native endianness dynamically. Changing fctl.BE requires the IOMMU to be disabled and re-enabled, and doing that may affect the IO traffic to the secure guests as those may already be running by the time the user selects the user hypervisor to launch and its endianness becomes known.

But if the security hypervisor runs at HS-level and wants to support nested hypervisors, then it will have no real choice but to implement shadow G-stage page tables, as I suggested before.

I hope I am not missing something that prevents this from being accomplished.

jhauser-us commented 1 year ago

For the security hypervisor to use PMP for memory protection, the M-mode has to provide some services. The traps to the security hypervisor will need to go through the M-mode monitor that will reflect them to the security hypervisor.

We should not design the current version of the IOMMU with this assumption. It's possible, but not likely to ever be supported.

The user hypervisor may be of same endianness as the security hypervisor sometimes and may be of a different endianness at other times. The security hypervisor may not wish to change its native endianness dynamically. Changing fctl.BE requires the IOMMU to be disabled and re-enabled,

I agree this would not be good if the "security hypervisor" is creating more than one VM with different endianness. However, since we should assume the security hypervisor is running at HS level, we should expect it to do shadow G-stage page tables, as I said. Bringing this back down to earth, that means what I said before: we only need one bit in the DC, SBE, which affects the PDT, if there is one, and all S/VS-stage page tables (everything pointed to by the DC's fsc). The endianness of G-stage page tables can be controlled by fctl.BE, same as the DDT.

ved-rivos commented 1 year ago

This was the only case I could think would merit a separate endian control as originally proposed for G-stage page tables. I agree this is somewhat niche and could be dealt with in platform specific manner if needed for an SoC.

The updated proposal is as follows: Rename fctl.BE to fctl.SBE IOMMU supports 2 endianness controls.

fctl.SBE - governs endianness of access to in-memory data structures and in-memory queues managed by S/HS-mode software.
DC.tc.VSBE - governs endianness of access to memory management data structures managed by VS-mode software such as the guest OS. When capabilities.BE is 0, the DC.tc.VSBE must match fctl.SBE. When capabilities.BE is 1, then DC.tc.VSBE may be set to 0 or 1. Memory access to data structures governed by DC.tc.VSBE are little endian if DC.tc.VSBE is 0 and big endian otherwise.

Endianness of Implicit accesses to data structures is governed as follows:	Data Structure	Governed by
Command Queue	`fctl.SBE`
Fault Queue	`fctl.SBE`
Page Request Queue	`fctl.SBE`
Device Directory Table	`fctl.SBE`
PDT when G-stage not active	`fctl.SBE`
S-stage page table	`fctl.SBE`
G-stage page table	`fctl.SBE`
MSI page table	`fctl.SBE`
PDT when G-stage active	`DC.tc.VSBE`
VS-stage page table	`DC.tc.VSBE`

Device Context configuration checks extended to detect misconfiguration if

capabilities.BE is 0 and fctl.SBE != DC.tc.VSBE

Add a notes in hardware guidelines: To support mixed-endian data structure access, the IO bridge must implement byte-invariant addressing such that a byte access to a given address accesses the same memory location in both little-endian and big-endian mode of operation.

Add notes for software:

The PSCID field of first stage context, along with the GSCID, identifies an address space. Configuring identical GSCID and PSCID in two DC but with different VSBE may lead to the IOMMU interpreting a VS-stage PTE in big endian form or in little endian form. No other behaviors are expected.
Software must use an appropriate software sequence to swap bytes as necessary to create a mutually agreed data representation when sharing data with an IO agent that does not use the same endianness for memory access as that used by software.
Software must use an LR/SC sequence with the instructions that perform the byte swapping when the data shared with the IO agent must be accessed atomically if the IO agent does not use the same endianness for memory acccess as that used by software.

ved-rivos commented 1 year ago

Two pull requests to address issue: Part 1 - #67 Part 2 - #68

jhauser-us commented 1 year ago

The updated proposal is as follows: Rename fctl.BE to fctl.SBE IOMMU supports 2 endianness controls.

fctl.SBE - governs endianness of access to in-memory data structures and in-memory queues managed by S/HS-mode software.

DC.tc.VSBE - governs endianness of access to memory management data structures managed by VS-mode software [...]

Although your new way is okay, you've made it both more complicated and less flexible than what I suggested. My way is

Data Structure	Governed by
Command Queue	`fctl`.BE
Fault Queue	`fctl`.BE
Page Request Queue	`fctl`.BE
Device Directory Table	`fctl`.BE
G-stage page table	`fctl`.BE
MSI page table	`fctl`.BE
PDT	DC.tc.SBE
S/VS-stage page table	DC.tc.SBE

Please notice that my table accomplishes everything we need with fewer rows.

ved-rivos commented 1 year ago

That will work. The PR are updated: Part 1 - #67 Part 2 - #68

jhauser-us commented 1 year ago

I see your pull request has a single SBE bit in the fctl register. That won't be accepted. The endianness of each virtual machine may be different, so the SBE bit must be in the device context, not global. I've consulted with the Architecture Review Committee, and they will insist on that.

ved-rivos commented 1 year ago

Are you sure you are reading PR #68 ? Its possible github is playing tricks. This is what I see. Could you please tell me what you see.

ved-rivos commented 1 year ago

I re-read the PR and I think I had a error in parts. Thanks for spotting that.

jhauser-us commented 1 year ago

Yeah, that's right; I saw the error in the table, which you've now fixed. I'll try to review the rest of #68 soon.