riscv / riscv-isa-manual

RISC-V Instruction Set Manual
https://riscv.org/
Creative Commons Attribution 4.0 International
3.7k stars 645 forks source link

PMP register subset flexibility #1212

Open sorear opened 9 months ago

sorear commented 9 months ago

(Looking at this again because of sPMP, which should be consistent with some of this.)

The ratified 20190608 specification was quite clear about the relationship between PMP entries and CSRs and allows an implementation to provide, say, 8 or 12 address comparators.

Up to 16 PMP entries are supported. If any PMP entries are implemented, then all PMP CSRs must be implemented, but all PMP CSR fields are WARL and may be hardwired to zero.

The current specification says this instead:

Up to 64 PMP entries are supported. Implementations may implement zero, 16, or 64 PMP entries; the lowest-numbered PMP entries must be implemented first. All PMP CSR fields are WARL and may be read-only zero.

Changing "PMP CSRs" to "PMP entries" in the second sentence makes it much less clear that an implementation is allowed to provide 8 or 24 address comparators; "CSR is implemented" has a relevant meaning (although "CSR exists" would be a better match with the rest of the document), but "PMP entry is implemented" only appears in this section and could be interpreted in several ways. Assuming that this change is intentional, can we state that a PMP entry is considered to be "implemented" if the corresponding pmpaddrX and pmpcfgY CSRs exist, whether or not the fields are writable?

I am interpreting "CSR fields are WARL" to require the L, A, and RWX fields to be able to contain all documented values if not read-only zero, since there is no language allowing an implementation to provide subsets of values like there is for e.g. mtvec. The language for PMP address registers likewise authorizes tying certain bits to zero but not, for instance, requiring PMP addresses to be primes.

In general, the PMP grain is 2^G+2^ bytes and must be the same across all PMP regions.

I believe this should be "same across all PMP entries" since the grain affects the PMP address register in all modes, including A=OFF.

(Do we need the freedom for the number of writable address bits and whether L, A, or RWX are implemented to vary between PMP entries where the pmpaddrN or pmpNcfg is writable? Do we need the freedom for the set of writable pmpaddrN CSRs to be nonconsecutive?)

If no PMP entry matches an S-mode or U-mode access, but at least one PMP entry is implemented, the access fails.

Does this sentence apply if an implementation decodes all pmp* registers as read-only zero? Would that be considered implementing the PMP entries or not implementing the PMP entries?

gfavor commented 9 months ago

In the newer wording , the second sentence is more consistent with the first sentence in referring to PMP entries. The last sentence remains as is and is what allows (in both the older and newer texts) one to implement less than a full 16 or 64 address comparators.

Regarding "CSR fields are WARL", the more definitive arch statement is the actual definitions of each CSR - which explicitly and more precisely indicates what fields are individually and separately WARL. This, for example, makes clear that the R, W, and X bits are separate WARL fields (not one 3-bit WARL field).

Regarding PMP grain, that sentence is immediately preceded by "Although the PMP mechanism supports regions as small as four bytes, platforms may specify coarser PMP regions." In other words, the second sentence is being consistent with the first sentence in talking about "regions". Practically speaking, "region" is synonymous with "entry". This text (for better or worse) talks in the more conceptual term of regions (as expressed by PMP entries) since it is talking about the size of a region (whereas talking about the size of an entry would be less clear).

As far as possibly hardwiring all "implemented" entries, that would still count as implementing the 16 or 64 entries that one would be claiming to implement (although a user of the CPU core might be unhappy about that implementation). The more telling question is what is the difference, for example, between implementing 16 and 64 entries?

And the answer is that implemented" CSRs can be accessed without causing an exception (and may or may not return hardwired values depending on the implementation), whereas "unimplemented" CSRs are reserved and can potentially result in an exception on an attempted access.

allenjbaum commented 9 months ago

It isn't quite that simple, naturally. " L, A, and RWX fields to be able to contain all documented values if not read-only zero" is definitely not the case, e.g. R and W form a "collective" WARL field which makes R=0, W=1 not legal, but the other 3 combinations are legal The A field is allowed to have any set of values be legal. For example, the possible legal values might only be "Off" and "NAPOT" , or "Off" and "Tor" TOR (and "Off" is always legal, since it is the required reset value.

As for the number of entries: if any entries are implemented, then at least 16 are guaranteed to be accessible without causing an exception. If the 17th was implemented, then the 17th through 64th must be accessible without causing an exception. Any number of those could be RdOnly0 (and it is probably not uncommon for an implementation to have, say 8 writable and 8 RdOnly0).

I thought that "lower entries are implemented first" meant that any that were not RdOnly0 had to be lower numbered than the ones that were RdOnly0 - but I don't think that is supported by the wording, though I think it is the intent (and makes it nearly impossible for architectural. compatibility tests, though the tests will still assume that)

On Wed, Jan 31, 2024 at 1:34 PM gfavor @.***> wrote:

In the newer wording , the second sentence is more consistent with the first sentence in referring to PMP entries. The last sentence remains as is and is what allows (in both the older and newer texts) one to implement less than a full 16 or 64 address comparators.

Regarding "CSR fields are WARL", the more definitive arch statement is the actual definitions of each CSR - which explicitly and more precisely indicates what fields are individually and separately WARL. This, for example, makes clear that the R, W, and X bits are separate WARL fields (not one 3-bit WARL field).

Regarding PMP grain, that sentence is immediately preceded by "Although the PMP mechanism supports regions as small as four bytes, platforms may specify coarser PMP regions." In other words, the second sentence is being consistent with the first sentence in talking about "regions". Practically speaking, "region" is synonymous with "entry". This text (for better or worse) talks in the more conceptual term of regions (as expressed by PMP entries) since it is talking about the size of a region (whereas talking about the size of an entry would be less clear).

As far as possibly hardwiring all "implemented" entries, that would still count as implementing the 16 or 64 entries that one would be claiming to implement (although a user of the CPU core might be unhappy about that implementation). The more telling question is what is the difference, for example, between implementing 16 and 64 entries?

And the answer is that implemented" CSRs can be accessed without causing an exception (and may or may not return hardwired values depending on the implementation), whereas "unimplemented" CSRs are reserved and can potentially result in an exception on an attempted access.

— Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-isa-manual/issues/1212#issuecomment-1920007686, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJUSLKUEFJTEH44MGQLYRK2GBAVCNFSM6AAAAABCSETDFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGAYDONRYGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>