[sw] Desired behaviour for non-32 bit/unaligned accesses to device memory

GregAC commented 4 years ago

I'm in the process of updating the documentation/fixing up our tilelink adapters for Ibex (see https://github.com/lowRISC/opentitan/issues/1867 and linked issues for the gory details) and have realised we currently see the following for accesses to device memory (i.e. reading/writing registers within an IP block)

An lb or lh that targets a register address will execute without issue. At the device a 32-bit read will occur, the result in the destination register for the load will be the result of the 32-bit read with the appropriate bytes selected out given lb vs lh and the bottom address bits.
An unaligned lw that targets a register address will result in 2 32-bit reads (the 32-bit register the unaligned address hits and the next 32-bit register), this again will execute without issue (if both registers exist). Result in the destination register for the load will be the results of the 2 32-bit reads appropriately combined given the byte address.

Instead we could have no device read occur and an error for both of these scenarios where error is defined as specified in https://docs.opentitan.org/doc/rm/register_tool/#error-responses. You will get an exception for 'non-secure' devices and random data/all 1s for 'secure' devices (unless a specific mode is activated in which case both would cause an exception).

tjaychen commented 4 years ago

@sjgitty @eunchan

eunchan commented 4 years ago

For 2, the first access shall create an error if the core accesses memory-mapped registers. *_reg_top.sv filters un-aligned access and the access asserts addrmiss. It raises an error if devmode is turned on. If not, garbage data will be returned.

GregAC commented 4 years ago

It won't though because the unaligned lw becomes two Get messages on TL-UL. Current thinking is the adapter for Ibex will always generate 32-bit accesses so to comply with the TL-UL protocol both of those Get messages will be full 32-bit reads so the device side of things has no way to know this is actually an unaligned load.

If we alter the Ibex adapter so it produces more tightly bound Get messages (so you get two 16-bit reads or one 8-bit and one 32-bit depending upon the address for an unaligned access) then it might fail, depends on whether the device side will give an error on a 8-bit/16-bit access.

tjaychen commented 4 years ago

For 1, i think that was a deliberate choice actually. @sjgitty I think Scott will remember most about the details.

For 2, it actually sounds like the ideal behavior you want is more host side memory attribute no? So that an unaligned access made to device space immediately errors back? I don't know if it's a good idea to rely on forming the transaction such that device side will error to it. If we make some change in the future, that model may just break. Did I understand you correctly though? Is that what you meant?

GregAC commented 4 years ago

Well I was posing this from a SW point of view as to what the desired behavior was, then from that implement something so it happens.

But you're right I was initially thinking we could deal with unaligned by ensuring we get an error from the device if it sees a non 32-bit read but that's brittle and doesn't behave all that well (for one thing you might get the error or the first or second transaction depending upon the address) .

One of the reasons I highlighted 1. is you may get the potentially surprising behaviour that an 'sb' or 'sh' to some device address fails (but you can write the same bytes with an 'sw') whilst an 'lb' or 'lh' doesn't fail. Though on more careful reading of the register tool documentation (https://docs.opentitan.org/doc/rm/register_tool/#error-responses) this has been explicitly agreed as acceptable.

So as with another tilelink issue I raised it may just be some reconfiguration of the documentation is called for to make these behaviours more obvious (I think the register tool documentation needs some splitting up, some of it is useful to people who actually want to do things with register tool other bits are important to someone writing software who doesn't want to touch the existing device setup).

cdgori commented 4 years ago

From the discussion this morning, I believe that I heard:

-For I-fetch we have 16-bit aligned code (due to compressed instructions) -For data load/store, generated code should be 32-bit aligned These are two different TL ports, with different (?) PMP applied.

Mostly the concern is around the unaligned load/store, I think (? - this assumes that regions are marked as code or data and cannot cross over, otherwise we have to consider both).

I suggested perhaps making any unaligned access an exception, and Ziv rightfully said maybe that is too restrictive for little/no security benefit, but I don't think we identified a situation where unaligned loads (or stores) are needed yet either.

If we allow the unaligned load/store, we need an explicit set of test cases for the Ibex/PMP where the unaligned access straddles an allowed/unallowed boundary - there are at least 2 cases (allowed follows unallowed, and unallowed follows allowed).

If I understood correctly, there was a comment that the Ibex doesn't yet have a concept of "device" memory, but there was some concern about overloading the functionality of the PMP to add this capability to PMP - I agree with this sentiment in general.

I don't think we took a decision here yet.

(If my summary mischaracterized something, please correct, I tried to summarize my understanding.)

tjaychen commented 4 years ago

Well I was posing this from a SW point of view as to what the desired behavior was, then from that implement something so it happens.

But you're right I was initially thinking we could deal with unaligned by ensuring we get an error from the device if it sees a non 32-bit read but that's brittle and doesn't behave all that well (for one thing you might get the error or the first or second transaction depending upon the address) .

One of the reasons I highlighted 1. is you may get the potentially surprising behaviour that an 'sb' or 'sh' to some device address fails (but you can write the same bytes with an 'sw') whilst an 'lb' or 'lh' doesn't fail. Though on more careful reading of the register tool documentation (https://docs.opentitan.org/doc/rm/register_tool/#error-responses) this has been explicitly agreed as acceptable.

So as with another tilelink issue I raised it may just be some reconfiguration of the documentation is called for to make these behaviours more obvious (I think the register tool documentation needs some splitting up, some of it is useful to people who actually want to do things with register tool other bits are important to someone writing software who doesn't want to touch the existing device setup).

yeah that's a fair point about the regtool doc. There is indeed lots of information in there.

tjaychen commented 4 years ago

From the discussion this morning, I believe that I heard:

-For I-fetch we have 16-bit aligned code (due to compressed instructions) -For data load/store, generated code should be 32-bit aligned These are two different TL ports, with different (?) PMP applied.

I believe this is true. https://github.com/lowRISC/opentitan/blob/master/hw/vendor/lowrisc_ibex/rtl/ibex_pmp.sv#L10

and here here

https://github.com/lowRISC/opentitan/blob/master/hw/vendor/lowrisc_ibex/rtl/ibex_core.sv#L98

Mostly the concern is around the unaligned load/store, I think (? - this assumes that regions are marked as code or data and cannot cross over, otherwise we have to consider both).

I don't actually know there is anything explicitly doing this.... the non-cross-over I imagine would really be marked out via PMP. For example..flash has both data and code.

I suggested perhaps making any unaligned access an exception, and Ziv rightfully said maybe that is too restrictive for little/no security benefit, but I don't think we identified a situation where unaligned loads (or stores) are needed yet either.

I tend to fall into the camp of no unaligned also...

If we allow the unaligned load/store, we need an explicit set of test cases for the Ibex/PMP where the unaligned access straddles an allowed/unallowed boundary - there are at least 2 cases (allowed follows unallowed, and unallowed follows allowed).

@cdgori what do you think is the right behavior here? Since ibex converts them into two separation transactions, does this mean it is technically okay for each to be checked independently?

But I guess your larger point is to make sure that ibex/pmp does not check on the first or original address and ignore the generated second transaction?

If I understood correctly, there was a comment that the Ibex doesn't yet have a concept of "device" memory, but there was some concern about overloading the functionality of the PMP to add this capability to PMP - I agree with this sentiment in general.

yeah i don't think pmp is suited too well for this..since it only has access control bits and not other attributes...plus it feels like we'd use more regions.

I've not looked into the riscv pma a ton, but is that something we are considering? It would be kind of nice to carve out the peripheral / main memory space (we'd only really need to check the higher order address bits).

I don't think we took a decision here yet.

(If my summary mischaracterized something, please correct, I tried to summarize my understanding.)

sjgitty commented 4 years ago

My intuition says that the bus converter should stick to the original intent. I.e. an unaligned 32b read should request two 16b reads with the appropriate offset, thereby allowing the endpoint to make the correct designation. It needs to be guaranteed (as discussed in the security meeting) that if one request responds in error, then the whole unaligned response is marked as error, but I believe you've already confirmed that @GregAC. This might be more work on the bus converter, but allows the endpoints to make a fully informed decision.

Just my 2c

GregAC commented 4 years ago

I suggested perhaps making any unaligned access an exception, and Ziv rightfully said maybe that is too restrictive for little/no security benefit

One downside of making all unaligned access an exception is whilst it makes things simpler for OT Ibex, from an Ibex/lowRISC perspective it's yet another configuration option (as we wouldn't want to remove unaligned handling from the Ibex RTL entirely, rather give the option to do it or cause an exception).

I've not looked into the riscv pma a ton, but is that something we are considering? It would be kind of nice to carve out the peripheral / main memory space (we'd only really need to check the higher order address bits).

The RISC-V specification is intentionally vague on the PMA point. It spends some paragraphs talking about how it could exist and what it might do and some terminology to use general guidelines to follow etc but unlike PMP doesn't specify anything about how it might work.

So I think this would leave us free to implement something very simple, e.g. say the top-half of address space is 'device' space and for Ibex to treat it differently to 'normal memory' space (e.g. by triggering an exception on any unaligned access).

From an Ibex point of view we'd want to implement some kind of function that takes an address and says whether or not its device space or maybe even a module. This would allow users to Ibex to choose how they decide what is device space rather than enforcing whatever choice we make for OT on all Ibex applications.

but I believe you've already confirmed that @GregAC. This might be more work on the bus converter, but allows the endpoints to make a fully informed decision.

So the current bus converter always emits 32b accesses, for both read and write (but for write uses the mask field to limit the bytes effected, for reads, due to TL-UL requirement all mask bits must be set). This means a 32b unaligned load turns into two 32b aligned loads and a TL-UL endpoint cannot tell they're accessing less than 32b. Obviously we can change this (the bus convertor could choose an appropriate size based on the be from Ibex or Ibex could output a size as part of the request). Though I think if we want to catch unaligned access to device we're better off with a PMA based solution.

tjaychen commented 4 years ago

yeah i am in agreement with @GregAC (assuming i understood you correctly) on support of a simple PMA solution (especially if the spec is vague :) ). We can pass in a parameter to easily do the cutoff...say....everything below 0x4000_0000 is memory, and everything above is device.... (or we can be more granular if you prefer).

I think this helps to contain the behavioral issue entirely within ibex without any dependency on the downstream... and also means the tlul host adapter wouldn't need to understand that something coming through it was in fact unaligned and that it needs to remember an error on either should cause a full error (I'm guessing this is what the unaligned logic inside ibex is doing already).

Greg do you see any major downsides going with a solution like this? Obviously we'll put even more latency on the core to fabric path..but are there other glaring holes you think?

asb commented 4 years ago

I wasn't in the discussion regarding using PMPs for this, but I feel PMAs are a better option too. I can see no reason to allow misaligned accesses to device memory.

I wouldn't be opposed to disabling misaligned accesses altogether for the configuration of Ibex within OpenTitan, though do observe that the in the Arm world the M3 and M4 do allow misaligned access by default.

GregAC commented 4 years ago

I have opened an issue on the Ibex side to discuss what a PMA implementation could look like: https://github.com/lowRISC/ibex/issues/764

msfschaffner commented 2 years ago

@GregAC can this issue be updated to reflect the current state?

tjaychen commented 2 years ago

o sorry i was just refreshing my memory on the history. I'm pretty sure we did not do PMA. But looking back at this now, I also don't see why we should not put a mandate on software to access normal device registers with word accesses only.

ibex internally probably already treats the separate 32b transactions as 1, so a failure of any is a failure of the whole, I feel like this is already enough.

Because of transmission integrity we know we favor word accesses at the system already, so requiring software to align for device registers really isn't a big deal here in my opinion.

eunchan commented 2 years ago

@GregAC Byte access to the device is not allowed FYI. SW should read full permitted bytes at once. For example, if a CSR has three bytes of valid field, SW should read 3 bytes not a byte among three. SW can read full word also.

Added to this. Recently we combined qe to be a unified signal per CSR (word). So, if we decide to support byte access, should revert that change.

GregAC commented 2 years ago

So a byte access in Ibex turns into a word access on the bus. So a lb for example would execute successfully but the device would see a full word read.

Stores use the a_mask byte mask, so an sb for example becomes a PutPartialData with a single bit set in the mask. We don't intend to have explicit support for device byte writes but Ibex can effectively produce them. @eunchan do you know how the tlul fabric/devices will respond to this?

tjaychen commented 2 years ago

The reg tool behavior (and this is consistent with the design right now) is shown here

The main thing is if we issue a byte write to device CSR, the address must be word-aligned. Is that what you guys already do in ibex @GregAC ? it kind of sounds like it.

GregAC commented 2 years ago

The current behaviour is fine and is documented (though I will check if there's somewhere more prominent we could highlight this). For any loads from Ibex that are unaligned or non-word sized they will succeed, though unaligned accesses to device memory are not recommended as they result in two separate reads. lb and lh that are aligned to their access sizes will behave as expected (you'll get a single read, and the byte or half-word requested back from the register you read from). Writes not aligned to 32 bits will fail (regardless of write size). A write that is 32-bit aligned but of a size smaller than the register will also fail.

lowRISC / opentitan

[sw] Desired behaviour for non-32 bit/unaligned accesses to device memory #1868