riscv / riscv-profiles

RISC-V Architecture Profiles
Creative Commons Attribution 4.0 International
116 stars 33 forks source link

Relax stval written with instruction for illegal instructions #10

Closed adurbin-rivos closed 2 years ago

adurbin-rivos commented 2 years ago

"For illegal-instruction exceptions, stval must be written with the faulting instruction." We feel that this is burdensome to implement because of data dependent illegal instruction exceptions. One needs to carry the bytes through the pipeline (or some other mechanism to meet this requirement). For example, AIA spec has data dependent illegal instruction exceptions based on miselect values:

https://github.com/riscv/riscv-aia/blob/main/doc/src/CSRs.tex

When \z{miselect} is a number in a reserved range (currently \z{0x00}--\z{0x2F}, \z{0x40}--\z{0x6F}, or a number above \z{0xFF} not designated for custom use), attempts to access \z{mireg} raise an illegal instruction exception.

aswaterman commented 2 years ago

It’s certainly true that data-dependent exceptions make this feature more onerous to implement in some microarchitectures. Whether the onus is justified depends on the frequency of emulation traps and on whether emulation traps from X-only PMA/PMP regions need be supported (since in that case it isn’t possible to load the instruction from memory). My opinion is that the cost is justified. (This one might need to go to a vote.)

adurbin-rivos commented 2 years ago

I should have noted I was commenting specifically about A profiles, if it wasn't clear.

Within a given privilege mode of a particular profile wouldn't we expect the frequency of emulation traps to be low (or exceptional) if software is adhering to a given profile extension support? I think that's one of the main intents of the profile -- target instructions within a mandatory set of extensions. Or did I misunderstand your thinking, Andrew?

There is a performance call out for misaligned loads/stores: "Even when supported, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance." I know this isn't related to illegal instruction exceptions specifically, but the intent of that statement is rooted in expectations and functionality -- not optimizing for performance.

aswaterman commented 2 years ago

I generally agree we should operate under the assumption that emulation traps are uncommon, since systems for which that isn't the case won't be competitive. But don't forget that missing HW features isn't the only reason they'll occur: nested virtualization is an A-profile use case where we'll feel the need to accelerate these traps.

There's also still the functional matter of X-only PMA/PMP preventing emulation unless the instruction is delivered to *tval. Reasonable minds might say "just don't do that", but this concern continues to raise my hackles.

ved-rivos commented 2 years ago

So the X-only PMP/PMA concern for virtualization use case should be addressed by the use of HLVX.HU/WU that perform load but check for execute permission instead of read permission.

gfavor commented 2 years ago

The HLVX solution doesn't work for all the RVA-compliant designs that don't implement or enable the H extension. One can imagine there will be lots of such RVA-compliant designs.

More fundamentally, even when HLVX is available to use, it does not address the X-only PMP/PMA concern. X-only PMP/PMA will still cause an Access Fault on an HLVX.

ved-rivos commented 2 years ago

That is interesting that X-only PMP/PMA have a special case for HLVX since the HLVX access I understood is a "Code fetch" as far as MMU is concerned but the data is written to a register instead of being decoded. I will try to study the reason why X-only PMP/PMA disallow HLVX.

ved-rivos commented 2 years ago

Would it be the case that machine mode would not lock something into the PMP that requires emulation on that platform. Or is the reasoning that machine mode filling something into PMP as execute only would be agnostic to the platform and so would require the lower privilege levels to emulate it. I assume such execute only code could never be executed by M-mode itself since that could not be emulated?

gfavor commented 2 years ago

Btw, I'm curious about the need to "carry the bytes through the pipeline"? Speaking from our high-performance o-o-o design, we never encountered such a need. With 10's to 100's of instructions in flight, passing instruction bytes "through the pipeline" would be quite ugly (to say the least). We had no issue in avoiding that. I wouldn't even say it was a matter of engineering ingenuity. It's also worth noting that there aren't true data-dependent exceptions that can only be detected during actual execution of an instruction (i.e. doing register-based computations).

gfavor commented 2 years ago

To Ved's last post ... M-mode is only concerned about protecting its memory-mapped resources from access by lower privilege modes. At least in this regard M-mode doesn't care what kind of bare-metal/RTOS/OS/hypervisor environment is running at lower modes and may very well not be aware of those specifics. In other words M-mode configures PMPs based on its proetction needs and on what parts of the physical address space is it going to "hand over" to lower modes to be accessible by them, but otherwise can be agnostic to what exactly is running below.

In this case of stval, the Supervisor's need to read instruction bytes is completely transparaent to M-mode. Conversely, if M-mode had to set up PMPs to allow R permission for this Supervisor-level purpose of reading instruction bytes, then it would have to do that for all of the address space that it allows access to by lower modes (or at least to all regions that lower modes might fetch code from).

Lastly, note that M-mode has its own mechanism to read from X-only PMP areas in the case when IT needs to do so. But that mechanism is not available to S/HS-mode.

ved-rivos commented 2 years ago

I hope the M-mode that provides a X-only region to lower privilege also disallows single stepping that code as that seems to be the motivation to disallow HLVX but single stepping allows inferring instructions without reading them. With AiA there are true data dependent faults - for example, when siselect CSR is written with a illegal value, the fault is not on siselect write but later when the sireg CSR is used to access what was selected by siselect.

I dont think instruction bytes need to be passed through the pipeline but there needs to be a way to recover the instruction bytes when an instruction faults at execution like the CSR access to the *sireg,

Given the arguments made for emulation of code mapped in PMP as X-only I am okay with making the *stval reporting of instruction bytes as mandatory.

Orthogonally one may want to be sure that reporting of the instruction bytes from the x-only PMP through *stval is not a violation of the x-only goals.

gfavor commented 2 years ago

Note that debug single-stepping takes you to M-mode (or to an external debugger), and M-mode can read memory that may be X-only to lower privilege modes. (Disallowing single-stepping through lower-mode X-only regions, in some people's eyes, is probably a non-starter.)

ved-rivos commented 2 years ago

Agree disallowing single stepping may be not acceptable. But if the reasoning for not allowing HLVX to X-only PMP was to allow execute but not read of code bytes then single stepping can expose the code bytes is what I was getting at. If the X-only region was not S/U accessible then HLVX would have failed by itself due to lack of privilege and not because of being X-only.

ved-rivos commented 2 years ago

Could this be a platform requirement instead of a ISA profile requirement? It may be more prevalent to have firmware that needs emulation locked PMP to be provided as X-only code for execution by lower privileges in M-class platforms? Also perhaps more prevalent to require more performance critical trap-and-emulate on M-class platforms than A-class platforms? A-class platforms do require H-extension. So besides this case the use of HLVX or HLVX+HU by a user mode VMM or use of sstatus.SUM+sstatus.MXR seems to suffice to obtain the instruction bytes to do the emulation. Hope I not missing some other case?

To Greg your point about having the instruction bytes around to report it depends on the implementation. For example a implementation may have multiple levels of instruction decoders and so the decoder that generates the illegal inst may not have the instruction bytes around. Certain illegal instructions may get generated late - e.g. the AiA *sireg, vstart not being 0, unsupported EEW used by vector load/store, etc. In some implementations the internal encoding of the op/ops may not be same as the architectural encoding and in some implementations the instruction cache may not be inclusive of all ops in the OOO part of the machine. It may be possible to reconstruct/re-fetch the bytes etc. but adds complexity that may not be needed on A-class platforms and so be optional for A-class platforms?

kasanovic commented 2 years ago

Regardless of implementation style, you have to fetch the original macro-instruction bytes before executing them. I find it hard to see that keeping these around would add considerable burden to anything that has high-performance complex execution and supports precise traps and tracing etc. OTOH, if software cannot rely on this feature, it complicates/prevents software handling of these traps due to atomicity/interleaving etc.

ved-rivos commented 2 years ago

On this particular item I have already dropped my objection. Quoting from my earlier post "Given the arguments made for emulation of code mapped in PMP as X-only I am okay with making the *stval reporting of instruction bytes as mandatory."