riscv-software-src / riscv-isa-sim

Spike, a RISC-V ISA Simulator
Other
2.43k stars 855 forks source link

AMO with store page fault and load guest-page fault #873

Closed scottj97 closed 2 years ago

scottj97 commented 2 years ago

Consider a two-stage (hypervisor) translation where:

A load to this page will take a guest-page fault. A store to this page will take a page fault.

Which one should an AMO take?

The spec says:

Attempting to execute a store, store-conditional, or AMO instruction whose effective address lies within a page without write permissions raises a store page-fault exception.

And accompanying commentary:

AMOs never raise load page-fault exceptions. Since any unreadable page is also unwritable, attempting to perform an AMO on an unreadable page always raises a store page-fault exception.

This suggests that checking the page for store permissions (only) is sufficient, and therefore the AMO should take a page fault.

Spike, however, is taking a guest-page fault, because it executes AMOs as a load followed by a store.

Is that okay?

scottj97 commented 2 years ago

This might be the first case in the ISA where a load can trigger a completely separate exception cause than a store (beyond simply load-X-fault versus store-X-fault), so this is the first time this implementation detail has become architecturally visible.

scottj97 commented 2 years ago

On second thought, this might be visible even without hypervisor.

A load to this page would take an access fault. A store to this page would take a page fault.

Which one should an AMO take?

gfavor commented 2 years ago

For both cases one can think in terms of there being an inherent natural order of checks. First a VA is translated by the MMU. Either that results in a Page Fault or in a valid PA to pass on to PMAs and PMPs to check before they then pass the PA along to do an actual memory access. Hence Page Faults are generally detected and reported before Access Faults.

Although, to be more accurate, each PA access during a tablewalk also gets checked by PMAs and PMPs. So Access Faults can be detected during a tablewalk. In general, there is a long sequence of actions to perform a translation, and Page Faults and Access Faults are detected and reported in that order.

Similarly, in the case of two-stage translation, first a VA is translated by VS-stage. Either that results in a Page Fault or in a GPA to pass on to G-stage to translate. Then that results either in a Guest Page Fault or in a PA to pass on to PMAs and PMPs to check before they then pass the PA along to do an actual memory access. Hence Page Faults are generally detected and reported before Guest Page Faults.

Although again, to be more accurate, each GPA access during a VS-stage tablewalk also gets translated by G-stage. So Guest Page Faults - and Access Faults - can be detected during a VS-stage tablewalk. In general, there is a very long sequence of actions to perform a two-stage translation, and Page Faults, Guest Page Faults, and Access Faults are detected and reported in that order.

scottj97 commented 2 years ago

There is no question what exception to take on either a load or a store. The question is specifically about AMOs, which do both a load and a store. When a store exception would be detected at an earlier point in the process than a load exception, there is a mismatch between Spike (which first tablewalks to check only load permissions, then tablewalks again to check only store permissions) versus an implementation that checks only store permissions (which the spec seems to allow, or possibly require).

aswaterman commented 2 years ago

This is a Spike bug if it's raising any sort of load fault on AMOs; only store faults should ever be raised on AMOs.

We attempt to catch them and re-issue them as store faults here: https://github.com/riscv-software-src/riscv-isa-sim/blob/a9b3babb8f27c063213d6189f4862269756a1a0c/riscv/mmu.h#L191 -- are we missing some case?

scottj97 commented 2 years ago

The load page fault is correctly converted to a store page fault by that code. The question is store page fault versus store guest-page fault.

aswaterman commented 2 years ago

Yeah, sorry, I think I'm on the same page now. In the situation you mentioned:

The correct thing to do is for the AMO to raise the page fault, because the ISA doesn't have any provision for AMOs being split into load and store phases. An AMO should raise the same exception a store would have raised (assuming, of course, that the underlying PMA supports both stores and AMOs for the given address and alignment).

Moving on to solving the problem: we could add a new method along the lines of mmu_t::check_store_permissions(reg_t addr, int size), which will check to see if a store to address addr of size size would succeed, and raises the appropriate store exception if not. Then, the AMO methods in mmu_t could invoke this method before performing the load. This might allow us to get rid of that try/catch block, too. Or, have you got a cleaner solution in mind?

scottj97 commented 2 years ago

It looks like this will require quite a large change. For example the MMIO interface here does not have any API for checking permissions without actually doing the store.

aswaterman commented 2 years ago

As far as the nuts and bolts go, we could extend the MMIO API to say that zero-byte stores are perm checks, which would probably reduce the surface area of changes we’d need to make.

Is this problem proving problematic for your verification use case? If not, we could just leave it as an erratum for now.

scottj97 commented 2 years ago

It is a real problem for me, but not a priority.

aswaterman commented 2 years ago

OK. I’ll leave thus open until one of us has time to revisit.

scottj97 commented 2 years ago

@aswaterman you said "which will check to see if a store to address addr of size size would succeed, and raises the appropriate store exception if not". You also said "we could extend the MMIO API to say that zero-byte stores are perm checks".

The MMIO device might need to know the actual size in order to decide whether it's legal.

I'll find some other way (if and when I get around to fixing this).

aswaterman commented 2 years ago

Resolved by #976