Closed scottj97 closed 2 years ago
This might be the first case in the ISA where a load can trigger a completely separate exception cause than a store (beyond simply load-X-fault versus store-X-fault), so this is the first time this implementation detail has become architecturally visible.
On second thought, this might be visible even without hypervisor.
A load to this page would take an access fault. A store to this page would take a page fault.
Which one should an AMO take?
For both cases one can think in terms of there being an inherent natural order of checks. First a VA is translated by the MMU. Either that results in a Page Fault or in a valid PA to pass on to PMAs and PMPs to check before they then pass the PA along to do an actual memory access. Hence Page Faults are generally detected and reported before Access Faults.
Although, to be more accurate, each PA access during a tablewalk also gets checked by PMAs and PMPs. So Access Faults can be detected during a tablewalk. In general, there is a long sequence of actions to perform a translation, and Page Faults and Access Faults are detected and reported in that order.
Similarly, in the case of two-stage translation, first a VA is translated by VS-stage. Either that results in a Page Fault or in a GPA to pass on to G-stage to translate. Then that results either in a Guest Page Fault or in a PA to pass on to PMAs and PMPs to check before they then pass the PA along to do an actual memory access. Hence Page Faults are generally detected and reported before Guest Page Faults.
Although again, to be more accurate, each GPA access during a VS-stage tablewalk also gets translated by G-stage. So Guest Page Faults - and Access Faults - can be detected during a VS-stage tablewalk. In general, there is a very long sequence of actions to perform a two-stage translation, and Page Faults, Guest Page Faults, and Access Faults are detected and reported in that order.
There is no question what exception to take on either a load or a store. The question is specifically about AMOs, which do both a load and a store. When a store exception would be detected at an earlier point in the process than a load exception, there is a mismatch between Spike (which first tablewalks to check only load permissions, then tablewalks again to check only store permissions) versus an implementation that checks only store permissions (which the spec seems to allow, or possibly require).
This is a Spike bug if it's raising any sort of load fault on AMOs; only store faults should ever be raised on AMOs.
We attempt to catch them and re-issue them as store faults here: https://github.com/riscv-software-src/riscv-isa-sim/blob/a9b3babb8f27c063213d6189f4862269756a1a0c/riscv/mmu.h#L191 -- are we missing some case?
The load page fault is correctly converted to a store page fault by that code. The question is store page fault versus store guest-page fault.
Yeah, sorry, I think I'm on the same page now. In the situation you mentioned:
The correct thing to do is for the AMO to raise the page fault, because the ISA doesn't have any provision for AMOs being split into load and store phases. An AMO should raise the same exception a store would have raised (assuming, of course, that the underlying PMA supports both stores and AMOs for the given address and alignment).
Moving on to solving the problem: we could add a new method along the lines of mmu_t::check_store_permissions(reg_t addr, int size)
, which will check to see if a store to address addr of size size would succeed, and raises the appropriate store exception if not. Then, the AMO methods in mmu_t
could invoke this method before performing the load. This might allow us to get rid of that try/catch block, too. Or, have you got a cleaner solution in mind?
It looks like this will require quite a large change. For example the MMIO interface here does not have any API for checking permissions without actually doing the store.
As far as the nuts and bolts go, we could extend the MMIO API to say that zero-byte stores are perm checks, which would probably reduce the surface area of changes we’d need to make.
Is this problem proving problematic for your verification use case? If not, we could just leave it as an erratum for now.
It is a real problem for me, but not a priority.
OK. I’ll leave thus open until one of us has time to revisit.
@aswaterman you said "which will check to see if a store to address addr of size size would succeed, and raises the appropriate store exception if not". You also said "we could extend the MMIO API to say that zero-byte stores are perm checks".
The MMIO device might need to know the actual size in order to decide whether it's legal.
I'll find some other way (if and when I get around to fixing this).
Resolved by #976
Consider a two-stage (hypervisor) translation where:
A load to this page will take a guest-page fault. A store to this page will take a page fault.
Which one should an AMO take?
The spec says:
And accompanying commentary:
This suggests that checking the page for store permissions (only) is sufficient, and therefore the AMO should take a page fault.
Spike, however, is taking a guest-page fault, because it executes AMOs as a load followed by a store.
Is that okay?