Should sstatus.mxr affect vs_stage implicit g_stage address translation?

GuoShibo-cn commented 1 year ago

According to Hypervisor Extension, Version 1.0 clarification about two-stage address translation:

Setting MXR at HS-level makes G-stage execute-only pages readable, Does it also include vs-stage implicit g-stage translation? It feels kind of weird to set page table entry stored memory region execute only.

gfavor commented 1 year ago

Yes. See https://github.com/riscv/riscv-isa-manual/issues/819#issuecomment-1036476652

Wangrodman commented 5 months ago

@gfavor As you mentioned, sstatus.MXR should take effect during the VS-stage implicit g_stage. Does this mean that the TLB should store the VS-stage implicit g_stage PTE's Xs and Rs and check such bits against sstatus.MXR during TLB access? Or, TLB not store the VS-stage implicit g_stage PTE's Xs and Rs, and when sstatus.MXR changes, the property VS-stage invalid trigger a subsequent page table walk to recheck permissions during the VS-stage implicit g_stage?

gfavor commented 4 months ago

There are various implementation approaches one can take (with differing cost, complexity, and performance). The requirement simply is that software must never observe any behavior that isn't allowed by the arch spec (and only behaviors spec'ed and allowed by the arch spec).

Wangrodman commented 4 months ago

There are various implementation approaches one can take (with differing cost, complexity, and performance). The requirement simply is that software must never observe any behavior that isn't allowed by the arch spec (and only behaviors spec'ed and allowed by the arch spec).

I think the impact of MXR on the VS-stage and G-stage is clearly defined in the spec, but I still need to clarify the VS-stage implicit g_stage part. In fact, I am more concerned about the SW MXR usage specifications to ensure consistent behavior across different HW implementations.

If a HW implementation stores the VS-stage into the TLB, and then sstatus.MXR changes from 1 to 0, this indirectly causes the read permission of the VS-stage implicit g_stage to become invisible (if the PTE's X=1, R=0). Obviously, a guest page fault should subsequently be observed before accessing the VS-stage, but it is not.

However, @aswaterman mentioned in #819 (comment) that the hypervisor quickly enables and then disables sstatus.MXR. Does this mean that my case above is not a valid usage because, ultimately, all PTEs in the VS-stage implicit g_stage will have R=1?

aswaterman commented 4 months ago

This might be a roundabout answer to your question, but the spec does not require any SFENCE/HFENCE to be executed for MXR changes to take effect. MXR changes take effect immediately. So, if you have a TLB design where the current setting of MXR is somehow cached, that implies that those entries need be flushed on MXR changes. (To me, that suggests this hypothetical design point is a questionable one, since we don’t want MXR changes to be costly.)

Wangrodman commented 4 months ago

This might be a roundabout answer to your question, but the spec does not require any SFENCE/HFENCE to be executed for MXR changes to take effect. MXR changes take effect immediately. So, if you have a TLB design where the current setting of MXR is somehow cached, that implies that those entries need be flushed on MXR changes. (To me, that suggests this hypothetical design point is a questionable one, since we don’t want MXR changes to be costly.)

Thank you for your response. As you mentioned, executing SFENCE/HFENCE for an MXR change is indeed strange and does not conform to the specification. From another perspective, if the HW implementation only allows sstatus.MXR to affect the VS-stage and G-stage, excluding the influence of the VS-stage implicit G-stage, would this be compatible with the specification?

aswaterman commented 4 months ago

Changing either MXR bit takes effect immediately; any HW implementation that delays the effect of an MXR change (e.g. via caching it without flushing a cache) is incorrect.

Wangrodman commented 4 months ago

Changing either MXR bit takes effect immediately; any HW implementation that delays the effect of an MXR change (e.g. via caching it without flushing a cache) is incorrect.

Okay, could you add a description that includes the effect of sstatus.MXR in this section?

aswaterman commented 4 months ago

Absolutely not. All such changes take effect immediately. If I added such a note, it would call other such changes into question.

dramforever commented 4 months ago

(To me, that suggests this hypothetical design point is a questionable one, since we don’t want MXR changes to be costly.)

I think this raises another question. This statement implies that a reasonable implementation would not cache MXR. However, it seems that making sstatus.MXR take effect immediately requires it to be cached in VS-stage PTEs. Otherwise, if a VS-stage PTE was read when sstatus.MXR = 1 and from memory where the G-stage page table has set X=1 and R=0, when later setting sstatus.MXR to 0, such a PTE would be invalid and not invalidating it would contradict the requirement for MXR to "take effect immediately".

Does the specification require such a PTE to be invalidated on setting sstatus.MXR from 1 to 0, or is it allowed to stay cached?

cc: @cyyself @EastonMan we had a discussion about this earlier and were unable to come to a conclusion what is the required behavior

elithist commented 4 months ago

I have similar concerns to the others if HS sstatus.MXR is intended to affect memory access made to support VS-stage address translation (such as to read/write a VS-level page table).

The implication is that a TLB entry for a 2-stage translation has to indirectly cache the effects of HS sstatus.MXR, at least without resorting to non-trivial workarounds.

This, taken with @aswaterman 's comment that an implementation that caches the effect of MXR is questionable, makes me wonder if @gfavor might have misinterpreted the original question?

My interpretation of the original question was whether HS sstatus.MXR affects memory access made to support VS-stage address translation (i.e. implicit loads to translate the guest-physical address of the page table structure to its physical address), and not whether HS sstatus.MXR affects the permissions of the instruction fetch or load/store at the G-stage translation.

In other words, should the text in the following section be clarified with something like: “For a memory access made to support VS-stage address translation (such as to read/write a VS-level page table), permissions and the need to set A and/or D bits at the G-stage level are checked as though for an implicit load or store, not for the original access type. This permission check is not affected by HS-level sstatus.MXR”

Thanks! Hopefully I'm not flogging a dead horse here.

aswaterman commented 4 months ago

I don't think there's a contradiction here. PTEs themselves are allowed to be incoherently cached, and that implicitly includes any effects MXR might've had on the implicit memory accesses involved in reading that PTE. (My comment about implementations that cache MXR being questionable was meant to refer to MXR's effect on explicit memory accesses.)

elithist commented 4 months ago

@aswaterman in the case the hypervisor has mapped the guest-physical to physical page tables with {R=0, X=1} (ignoring for now if that's even a sensible thing to do) then I struggle to see how you could cache a 2-stage translation effectively without also indirectly caching the effects of the HS-level MXR.

If we take SV39 as an example, the first stage VS-level translation would translate a virtual address to the guest-physical address, but in order to do so it might have to read up to 3 Page Table Entries.

Accesses to each of these Page Table Entries may require a G-stage translation, and while these translations themselves can be cached without also caching the effects of MXR, the issue as I see it comes because we actually need to use (and apply MXR) to complete the VS-level translation.

This means that if the VS-level translation was done with HS-level MX=1, and if the page tables are mapped with {R=0, X=1}, then the VS-level translation might succeed where it otherwise wouldn't where MXR=0.

You could of course tag the 2-stage TLB with the MXR setting, but doing so reduces the reach of your TLB and necessitates more table walks.

Is my understanding is that such a scenario where the hypervisor maps the page tables pointed to by hgatp with {R=0, X=1} is contrived and not something a hypervisor would ever do? If so, are we not mandating extra implementation complexity to effectively cache 2-stage TLBs by mandating that implicit loads in support of G-stage translation observe the effects of HS MXR?

Or have I totally missed the mark, and there is at least a reasonable use case for a hypervisor mapping the guest-physical page tables with {R=0, X=1}? (Very possible given my limited experience with hypervisors). If so, I'd appreciate an example use case for this!

Thanks!

elithist commented 4 months ago

I'll write up a concrete example to clarify my wordy ramblings, but I need to drive a couple hours right now so it'll be a bit later today.

elithist commented 4 months ago

Hopefully this noddy diagram helps somewhat; it's an example of a VS 2-stage translation, using giga-pages to condense the number of steps.

There are 3 points where the PTE permissions are checked against, marked in red, green and blue. The green and blue checks are clear, both in the specs and to my understanding.

However it's the red check that I believe this thread is concerned about. In this example, if the translation for VA->GPA->PA is started for a load when HS MXR=1, then all the checks should pass since all the PTEs with X permissions would be modified. However, if the translation VA->GPA->PA is started for a load when HS MXR=0, then the translation should fail the permission checks at the red checks, since the implicit load to read the L2 PTE should fail, as the Page Table is mapped with R=0 permissions.

This means that caching even the VA->GPA translation implicitly needs to cache the current HS MXR context; valid if MXR=1, fault if MXR=0. Otherwise, if we had cached the TLB translation for VA->GPA->PA when MXR=1, then used it when MXR=0, then this would be a violation of the ISA.

If, however, the red checks did not consider HS MXR, then we can trivially cache the attributes of both the VA->GPA PTE and the GPA->PA, and use that cached attributes to check against the current MXR settings.

dramforever commented 4 months ago

that implicitly includes any effects MXR might've had on the implicit memory accesses involved in reading that PTE

My understanding is that caching a PTE that has been read while (HS level) sstatus.MXR = 1 until after sstatus.MXR has been cleared to 0 is allowed by the ISA. There's no requirement for an implementation to notice that if a translation were to start later it would have not succeeded. For me personally it helped to understand that conceptually, the translation caches stores (leaf and non-leaf) PTEs, not the permissions.

aswaterman commented 4 months ago

I share your conceptual model that it is the PTEs that are cached. But things like PMP, hstatus.MXR, etc. affect the ability of the page-table walker to access those PTEs. So the effects of those settings are cached insofar as the PTE accesses succeed or fail. The actual settings of PMP, hstatus.MXR, etc. are of course not cached.

elithist commented 4 months ago

I agree that the the actual PMP/HS MXR itself isn't cached in the TLBs. Am I to understand then that in the example I've given, if a valid translation was cached when HS MXR=1, then subsequent accesses with HS MXR=0 are allowed to succeed? The previous posts from yourself @aswaterman indicates to me that this isn't permitted as the effects of HS MXR must be immediate.

In the case of a change in PMP, the ISA specifically calls out that the effects of PMP may be cached, and require a synchronisation through SFENCE.VMA.

I see no such provision for HS MXR here.

terranfund commented 4 months ago

Allow me to present my understanding from this thread and the Privileged Architecture 20240212, 1.13. Paragraph 2 is the concerned point called 'implicit-G address translation', used a lot in this thread, that is subject to G-stage address translation. That means, from paragraph 1, setting MXR at HS-level overrides such 'implicit-G address translation' execute-only permission. According to paragraph 3 and 4, change to the sstatus field MXR takes effect immediately on called 'implicit-G address translation' execute-only permission. Are the above descriptions right?

gfavor commented 4 months ago

As noted by the preceding post, there is a blanket statement that MXR changes take effect immediately. Which implies that any caching of two-stage translations needs to include info that enables taking MXR into account at the last minute, i.e. when a cached translation is in the process of being used to translate a VA. Baking into a cached translation an "old" version of MXR, and then later using that translation under a new and different MXR value, would violate the "immediate effect" requirement.

ingallsj commented 4 months ago

Good question. Concretely, the open-source Rocket-chip Page Table Walker (PTW) does not check MXR for vs-stage implicit g-stage translation: https://github.com/chipsalliance/rocket-chip/blob/dbcb06afe1c76d1129cb6d264949322a34c37185/src/main/scala/rocket/PTW.scala#L697

Either the ISA Manual or Rocket-chip ought to be updated.

aswaterman commented 4 months ago

I think the better fix is to state that MXR affects only explicit memory accesses. This matches the only intended use of the feature (allowing the OS to load instructions from RWX=001 pages).

Placing page tables in execute-only regions isn’t a logical thing to do, and we shouldn’t go out of our way to facilitate it. (And it does require going out of one’s way if the hstatus.MXR changes take effect immediately for implicit accesses: to avoid full TLB flushes on hstatus.MXR changes, you’d have to tag entries based upon whether they are sensitive to that field and selectively flush only those. That’s unnecessary complexity.)

elithist commented 4 months ago

thanks all. @aswaterman should i create a PR for this?

aswaterman commented 4 months ago

@elithist Go for it. I will discuss the matter at the next Architecture Review Committee meeting and might tweak your proposal based on that.

aswaterman commented 4 months ago

BTW, the supervisor-mode chapter appears to say the right thing already by using the word "loads" rather than "reads". Perhaps the hypervisor chapter just needs to change "readable" to "accessible to loads" (in two places: the Two-Stage Address Translation section and the Hypervisor Virtual-Machine Load and Store Instructions section).

Wangrodman commented 4 months ago

I suggest using the term "intermediate G-stage" in the ISA to refer to what we are currently discussing as the VS-stage implicit G-stage.

elithist commented 4 months ago

thanks @aswaterman, i see you've already made the changes

riscv / riscv-isa-manual

Should sstatus.mxr affect vs_stage implicit g_stage address translation? #1061