riscv / sail-riscv

Sail RISC-V model
https://lists.riscv.org/g/tech-golden-model
Other
433 stars 159 forks source link

Handle address translation for misaligned loads and stores better #467

Open Alasdair opened 4 months ago

Alasdair commented 4 months ago

Refactor the LOAD and STORE instruction so they split misaligned accesses into multiple sub-accesses and perform address translation separately. This means we should handle the case where a misaligned access straddles a page boundary in a sensible way, even if we don't yet cover the full range of possibilities allowed for any possible RISC-V implementation.

In addition tidy up the implementation in a few ways:

I tested this using the test in https://github.com/riscv/sail-riscv/issues/49. Additionally I ran the tests in this repository on a tweaked version of model that would split apart even correctly aligned loads and stores using the misaligned logic to test that it was at least doing something sensible.

github-actions[bot] commented 4 months ago

Test Results

712 tests  ±0   712 :white_check_mark: ±0   0s :stopwatch: ±0s   6 suites ±0     0 :zzz: ±0    1 files   ±0     0 :x: ±0 

Results for commit 4bdc440d. ± Comparison against base commit 2078d872.

:recycle: This comment has been updated with latest results.

Timmmm commented 4 months ago

I'm not sure this is the right approach, for a few reasons.

  1. We actually only need to split on page boundaries in order to fix the address translation issue.
  2. It's very unlikely that a chip will actually split accesses into memory operations like this. Eventually we want it to be configurable but until then I think it should at least be a likely implementation.
  3. There's no way you are going to want to copy & paste this throughout the float and vector load/stores!
  4. I think having all these details in the execute function itself is probably not ideal from an including-code-in-the-spec point of view.

Did you see this commit? I think it's a lot nicer to read something that abstracts away the virtual address translation a bit:

function clause execute(LOAD(imm, rs1, rd, is_unsigned, width, aq, rl)) = {
  let offset : xlenbits = sign_extend(imm);
  let width_bytes = size_bytes(width);

  // This is checked during decoding.
  assert(width_bytes <= sizeof(xlen_bytes));

  /* Get the address, X(rs1) + offset.
     Some extensions perform additional checks on address validity. */
  match ext_data_get_addr(rs1, offset, Read(Data), width_bytes) {
    Ext_DataAddr_Error(e)  => { ext_handle_data_check_error(e); RETIRE_FAIL },
    Ext_DataAddr_OK(vaddr) => {
      if   check_misaligned(vaddr, width)
      then { handle_mem_exception(vaddr, E_Load_Addr_Align()); RETIRE_FAIL }
      else match vmem_read(Read(Data), vaddr, width_bytes, aq, rl, false) {
        match value {
          Ok(result)    => { X(rd) = extend_value(is_unsigned, result); RETIRE_SUCCESS },
          Err(vaddr, e) => { handle_mem_exception(vaddr, e); RETIRE_FAIL }
        }
    }
  }
}

I couldn't quite do it as cleanly for stores because of the mem_write_ea() thing unfortunately. If we didn't have to worry about that then it could be as simple as that LOAD, just with vmem_write(..., X(rs1), ...).

Alasdair commented 4 months ago

We actually only need to split on page boundaries in order to fix the address translation issue.

I think the question is: should be the semantics of this be the same as splitting misaligned accesses into separate operations? If the observable semantics is the same then the model could choose to do things in a less efficient way.

I considered trying to only split only on page boundaries, but then I thought that there are probably a bunch of other cases where misaligned access straddle other things like PMP regions, and just splitting into separate operations might be cleaner.

Alasdair commented 4 months ago

On the other points regarding abstracting this detail behind a helper function, I agree. I'll take a closer look at that commit.

We've thought a bit more in general about misaligned and virtual memory for ARM, and I think the semantics there is that you split into byte sized accesses (unless you have some feature flags etc... it always gets more complicated).

Timmmm commented 4 months ago

If the observable semantics is the same then the model could choose to do things in a less efficient way.

That's true.

I think the semantics there is that you split into byte sized accesses.

That seems sensible! I think the PMP requirement that all bytes of a memory access be in one PMP region screws this up for RISC-V.

Alasdair commented 4 months ago

The interesting case would be: Misaligned access that straddles a page boundary, where each page is in a different PMP region. What is the envelope of allowed behaviour?

Timmmm commented 4 months ago

I made a diagram for these cases. In that case you can either split into separate memory operations in which case everything will be fine (first example in the image), or you are technically allowed to keep it as one discontinuous (!) memory operation in which case it will fail (second example).

I can't imagine any real systems that would do the latter but Andrew Waterman said it is allowed. The thing that makes it a single memory operation is that it cannot be observed to be partially complete.

image

Also if you're not being pedantic a device could just have the second case fail too, because nothing can observe that you actually did one memory operation if you claim you did two. I don't think the difference is observable.

Kind of feels like the PMP spec is just a bit broken tbh. I wonder if they even thought about this stuff.

Alasdair commented 4 months ago

Thanks, that's very useful. I think as a first go it would be reasonable to support the 0, 2, and 3 cases and not support 1 for the time being.

PeterSewell commented 4 months ago

As Alasdair says, we've looked a lot at this for Arm and IBM Power. Some interesting examples are in http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf in Section 2. Real implementations do a lot of observable splitting on finer granularity than pages, and also on finer granularity than cache lines.

Some of this ends up in the concurrency model and some in the Sail ISA definition. For the latter, for RISC-V, as the community is especially concerned to be able to configure the model to match specific hardware implementations, I'd suggest having configurable options for the granularity of splitting (single byte or larger aligned units), and for the ordering between accesses - architecturally, these are probably fine-grained and de-ordered. Some implementations will be nondeterministic, but some will have a deterministic order, and being able to configure the model that will make things easier to test. Within the Sail, there should be some abstraction: a function that does a memory access, that splits into multiple calls of another function that does single-copy-atomic memory accesses (probably with a translation for each).

In fact it's more complex than that, as the entire translations of multiple s-c-a accesses from a single instruction should typically be architecturally de-ordered. That needs some lightweight intra-instruction within Sail, which it doesn't yet support (no ISA description language does, AIUI) - but I think RISC-V can get a long way before we need to handle that.

I say all this without knowing much about RISC-V PMA...

Peter

On Wed, 15 May 2024 at 14:01, Alasdair Armstrong @.***> wrote:

Thanks, that's very useful. I think as a first go it would be reasonable to support the 0, 2, and 3 cases and not support 1 for the time being.

— Reply to this email directly, view it on GitHub https://github.com/riscv/sail-riscv/pull/467#issuecomment-2112456073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZXLPJPNGENSLG4ZIVDZCNMAFAVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGQ2TMMBXGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Alasdair commented 4 months ago

Ok, I refactored the pull request so there are separate vmem_read and vmem_write_from_register functions. I also added options to the C simulator that change the order in which address translations occur for misaligned accesses and control whether misaligned accesses are split into the largest possible aligned size, or always into bytes.

allenjbaum commented 4 months ago

This is my truth table of what I think is supposed to happen for misaligned loads. Note that the RSIC-V spec does not specify the order of misaligned accesses if they are not naturally aligned, and you can get two different results if you go low to high address, vs high to low address. So the parameters are:

HW misalign hipri-misalign low-first error(s) CAUSE TVAL mem 0 1 x x misalign low unchg 0 0 1 errH,OK misalign low unchg 0 0 1 OK, errL errL low unchg 0 0 1 errH,errL errL low unchg 0 0 0 errH,OK errH hi unchg 0 0 0 OK, errL misalign hi unchg 0 0 0 errH,errL errH hi unchg 1 x x errH,OK errH hi lo 1 x x OK, errL errL low hi 1 x 1 errH,errL errL low unchg 1 x 0 errH,errL errH hi unchg And: this does assume that the parameters are static and don't change during execution (so even with out of order, the assumption is that split accesses will (appear to) go in order with each other.

On Wed, May 15, 2024 at 8:39 AM Alasdair Armstrong @.***> wrote:

Ok, I refactored the pull request so there are separate vmem_read and vmem_write_from_register functions. I also added options to the C simulator that change the order in which address translations occur for misaligned accesses and control whether misaligned accesses are split into the largest possible aligned size, or always into bytes.

— Reply to this email directly, view it on GitHub https://github.com/riscv/sail-riscv/pull/467#issuecomment-2112885013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJXJA3ZLESQRKEIIW3DZCN6UNAVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSHA4DKMBRGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jrtc27 commented 4 months ago

The privileged spec is very clear that alignment faults take precedence over access faults.

Alasdair commented 4 months ago
  • do you support misaligned at all (and at which granularity - didn't include that in the truth table
  • do you split up the accesses, and if so, do you first access the lower address(es) or the higher address(es).

Ok good to know, those are essentially the two command line flags I have implemented.

allenjbaum commented 4 months ago

Not exactly: The RISC-V Instruction Set Manual: Volume II: Privileged Architecture

Table 15. Synchronous exception priority in decreasing priority order.

Priority Exc.Code Description

Highest 3 Instruction address breakpoint

12, 1

During instruction address translation: First encountered page fault or access fault

1

With physical address for instruction: Instruction access fault

2 0 8,9,11 3 3

Illegal instruction Instruction address misaligned Environment call Environment break Load/store/AMO address breakpoint

4,6

Optionally: Load/store/AMO address misaligned

13, 15, 5, 7

During address translation for an explicit memory access: First encountered page fault or access fault

5,7

With physical address for an explicit memory access: Load/store/AMO access fault

Lowest

4,6

If not higher priority: Load/store/AMO address misaligned

On Wed, May 15, 2024 at 12:34 PM Jessica Clarke @.***> wrote:

The privileged spec is very clear that alignment faults take precedence over access faults.

— Reply to this email directly, view it on GitHub https://github.com/riscv/sail-riscv/pull/467#issuecomment-2113318360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJWOVH7E5IGBK3ULTY3ZCO2D3AVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGMYTQMZWGA . You are receiving this because you commented.Message ID: @.***>

jrtc27 commented 4 months ago

Not exactly: The RISC-V Instruction Set Manual: Volume II: Privileged Architecture Table 15. Synchronous exception priority in decreasing priority order. Priority Exc.Code Description Highest 3 Instruction address breakpoint 12, 1 During instruction address translation: First encountered page fault or access fault 1 With physical address for instruction: Instruction access fault 2 0 8,9,11 3 3 Illegal instruction Instruction address misaligned Environment call Environment break Load/store/AMO address breakpoint 4,6 Optionally: Load/store/AMO address misaligned 13, 15, 5, 7 During address translation for an explicit memory access: First encountered page fault or access fault 5,7 With physical address for an explicit memory access: Load/store/AMO access fault Lowest 4,6 If not higher priority: Load/store/AMO address misaligned On Wed, May 15, 2024 at 12:34 PM Jessica Clarke @.> wrote: The privileged spec is very clear that alignment faults take precedence over access faults. — Reply to this email directly, view it on GitHub <#467 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJWOVH7E5IGBK3ULTY3ZCO2D3AVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGMYTQMZWGA . You are receiving this because you commented.Message ID: @.>

That was exactly the table I was looking at, but I missed the final row and interpreted the higher-priority "Optionally" as meaning "if you don't support misaligned accesses"... unhelpful wording.

allenjbaum commented 4 months ago

Jessica is correct about a nuance I missed: the truth table works only for explicit data accesses, but not for instruction accesses (or implicit accesses related to instruction accesses)

On Wed, May 15, 2024 at 1:26 PM Allen Baum @.***> wrote:

Not exactly: The RISC-V Instruction Set Manual: Volume II: Privileged Architecture

Table 15. Synchronous exception priority in decreasing priority order.

Priority Exc.Code Description

Highest 3 Instruction address breakpoint

12, 1

During instruction address translation: First encountered page fault or access fault

1

With physical address for instruction: Instruction access fault

2 0 8,9,11 3 3

Illegal instruction Instruction address misaligned Environment call Environment break Load/store/AMO address breakpoint

4,6

Optionally: Load/store/AMO address misaligned

13, 15, 5, 7

During address translation for an explicit memory access: First encountered page fault or access fault

5,7

With physical address for an explicit memory access: Load/store/AMO access fault

Lowest

4,6

If not higher priority: Load/store/AMO address misaligned

On Wed, May 15, 2024 at 12:34 PM Jessica Clarke @.***> wrote:

The privileged spec is very clear that alignment faults take precedence over access faults.

— Reply to this email directly, view it on GitHub https://github.com/riscv/sail-riscv/pull/467#issuecomment-2113318360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJWOVH7E5IGBK3ULTY3ZCO2D3AVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGMYTQMZWGA . You are receiving this because you commented.Message ID: @.***>

Alasdair commented 2 months ago

Should be rebased onto the latest master now.

Alasdair commented 1 month ago

If we want I could update this PR to handle the Zama16b extension/option we discussed today as the splitting function I add here would be the ideal place to add this option.

billmcspadden-riscv commented 1 month ago

Yes, please do. Thanks.

Bill Mc.

On Mon, Aug 5, 2024 at 9:53 AM Alasdair Armstrong @.***> wrote:

If we want I could update this PR to handle the Zama16b extension/option we discussed today as the splitting function I add here would be the ideal place to add this option.

— Reply to this email directly, view it on GitHub https://github.com/riscv/sail-riscv/pull/467#issuecomment-2269271571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXROLODNMEOI6GI2FPZ6QYTZP6GXRAVCNFSM6AAAAABHW7D3ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGI3TCNJXGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bill McSpadden Formal Verification Engineer RISC-V International mobile: 503-807-9309

Join me at RISC-V Summit North America http://www.riscvsummit.com/ in October!