riscv / riscv-control-transfer-records

This repo contains a RISC-V ISA extension (proposal) to allow recording of control transfer history to on-chip registers, to support usages associated with profiling and debug.
https://jira.riscv.org/browse/RVG-62
Creative Commons Attribution 4.0 International
14 stars 4 forks source link

Ved feedback #7

Closed bcstrongx closed 7 months ago

bcstrongx commented 8 months ago

Via email:

I have the following feedback on the CTR specification. Later in this email I have some editorial suggestions as well.

  1. The STE bit and ETEN bit are mutually redundant for U->S/VU->VS trap. The STE bit is in conjunction with ETEN and is only used for a VU/VS->HS trap. When H extension is not implemented, the specification would require a redundant ETEN. Consider optimizing to remove ETEN and using the AND of the TE bits leading to the privilege level where the trap is delegated as the effective TE instead of the and of the ETEN and *TE.

  2. Remove the requirement that if a DEPTH value is supported then all lesser depth values must be supported. Else this will not be compatible with a read-only DEPTH encoding in vsctrcontrol.

  3. Add non-normative notes about guarding against leaking a higher privileged pc on trap return.

  4. Add rationale for reserving some DEPTH encodings.

  5. Is the CLR control required and justified? The following options were discussed on the email thread. 1) Eliminate CLR control and require software to explicitly clear when needed using CSR writes. This requires 2^(DEPTH+4) * 4 CSR writes in a loop on context switches. 2) Mandate and justify the CLR control. Provide non-normative note on how this may be implemented optimally. 3) Make CLR optional with a justification for when it should be implemented. And an implementation note.

  6. When mstateen0.CTR is 0, hstateen0.CTR should appear to be read-only and 0.

  7. Consider a FREEZE-CLR control bit to avoid the read-modify-write hazard on the FREEZE bit or provide details on how such hazards can be avoided by SW actions.

  8. Reset behavior of custom fields should be unspecified. Reconsider the reservation of 2 bits in control for custom extensions. At best they may be unused and at worst insufficient.

  9. Clarify how WRPTR is modulated on change to DEPTH field - both increasing and decreasing depth. See below for suggestions.

  10. Clarify intention for recommending clearing of inaccessible entries when reducing depth. If this is to allow powering down those entries then that should be also specified that inaccessible entries may hold a legal but unspecified value when made accessible.

  11. Implicit writes to source/target entry are specified to zero extend from XLEN to MXLEN. However explicit writes are specified to require a read-modify-write to preserve the bits beyond XLEN. What is justification for requiring a read-modify-write and not specifying a zero-extend from XLEN to MXLEN even on CSR writes.

  12. CTR source and destination registers would want to be specified similar to xEPC in holding at least one invalid address and allowing invalid addresses to be transformed to a invalid address that can be held in the register. Further add clarification about needing to hold all valid PA as well (as specified for tval).

  13. When only IALIGN=32 is supported the bit 1 of source and destination entry be read-only zero.

  14. Rationale for metadata being a 256-bit register should be included.

  15. Are cycles in ctrdata required to be obtained from the same clock as mcycle. If so that should be clarified.

  16. The Freeze section should include the vsctrcontrol description for lcofi/breakpoints that occur in V=1.

ved

Editorial suggestions:

Original: "A method to record control flow transfer history is valuable for performance profiling usages, as well as for debug." Suggestion: "A method for recording control flow transfer history is valuable not only for performance profiling but also for debugging."

Original: "Control flow transfers refer to jump instructions (including function calls and returns), taken branch instructions, traps, and trap returns." Suggestion: This sentence is clear and technically accurate.

Original: "Profiling tools like Linux perf collect control transfer history when sampling software execution, enabling users, and tools like AutoFDO, to identify hot paths for optimization." Suggestion: "Profiling tools, such as Linux perf, collect control transfer history during software execution sampling. This enables users and tools like AutoFDO to identify hot paths for optimization."

Original: "CTR defines a circular (FIFO) control transfer history buffer." Suggestion: "CTR defines a circular (FIFO) buffer for control transfer history."

Original: "The source PC, target PC, and some optional metadata is stored for each recorded transfer." Suggestion: "The source PC, target PC, and some optional metadata are stored for each recorded transfer."

Original: "Logical entry 0 is always the youngest recorded transfer, entry 1 is the next youngest, etc." Suggestion: "Logical entry 0 always corresponds to the most recent recorded transfer, followed by entry 1 as the next most recent, and so on." Original: "The machine-level extension Smctr encompasses all added CSRs and all behavior modifications for a hart, over all privilege levels." Suggestion: "The machine-level extension, Smctr, encompasses all newly added Control Status Registers (CSRs) and behavior modifications for a hart across all privilege levels."

Original: "The associated supervisor-level extension Ssctr is essentially the same as Smctr except for excluding the machine-level CSRs and behaviors that should not be directly visible to supervisor level." Suggestion: "The corresponding supervisor-level extension, Ssctr, is essentially identical to Smctr, except that it excludes machine-level CSRs and behaviors not intended to be directly accessible at the supervisor level."

Original: "Smctr is dependent on the Smcsrind extension. Ssctr is dependent on the Sscsrind extension." Suggestion: "Smctr depends on the Smcsrind extension, while Ssctr depends on the Sscsrind extension."

Consider using wavedrom to render the register layout to be consistent with other specifications and for improved readbility. Add figure number captions to the register layout and table number captions to tables. Consider highlighting CSR names and fields using backticks - vsctrcontrol. Consider using __x__ctrcontrol insted of *ctrcontrol.

Mark "0" fields in the registers as WPRI.

The STE bit and ETEN bit are mutually redundant for U->S/VU->VS trap. The STE bit is in conjuntion with ETEN is only used for a VU/VS->HS trap. When H extension is not implemented, the specification would require a redundant ETEN. Consider optimizing to remove ETEN and using the AND of the TE bits leading to the privilege level where the trap is delegated as the effective TE.

DEPTH[3:0] implies a 4 bit field. It should be 3 bit field. Can remove the [3:0] label as width of DEPTH field is provided in the register layout about. Is there intent to use DEPTH as anything besides depth for future and is why `11x are reserved? If not why not specify the depth as 2^(DEPTH+4)?

Original: "The depth of the CTR buffer dictates the number of entries to which the hardware will record transfers." Suggestion: "The depth of the CTR buffer dictates the number of entries to which the hardware records transfers."

Original: "For a depth of N, the hardware will record transfers to entries 0..N-1." Suggestion: "For a depth of N, the hardware records transfers to entries 0 through N-1."

Original: "All entry registers are read-only 0 when the selected entry is in N..255." Suggestion: "All entry registers read as '0' and are read-only when the selected entry is in the range N to 255."

Original: "The maximum DEPTH value supported is implementation-specific, but all lesser values must also be supported." Suggestion: "The maximum DEPTH value supported varies by implementation. If a certain DEPTH value is supported then all lesser values must also be supported." However, this may end up contradicting the requirement that the DEPTH is read-only reflection of the sctrcontrol.DEPTH when V=1. Suggest removing this requirement that all lesser values must be supported.

Original: "If software writes a DEPTH value above the maximum supported, DEPTH must read back the maximum supported value." Is it required to read back the maximum supported value. It would be better to allow implementation to report any legal value following WARL.

Original: "An implementation may opt to hardcode some or all of the bits in this field, based on the depth options supported." Suggestion: "An implementation may choose to hardcode some or all bits in this field, depending on the supported depth options." However, this is implied by WARL.

Custom field should not require a specified reset behavior.

Original: "All fields are optional save for M, CLR, BPFRZ, and DEPTH." Suggestion: "All fields are optional except for M, CLR, BPFRZ, and DEPTH."

Original: "All unimplemented fields are read-only 0, while other defined fields are writable, though DEPTH may have some read-only 1 bits." Suggestion: "All unimplemented fields are read-only with a default value of 0. Other defined fields are writable, although DEPTH may contain some bits that are read-only and set to 1."

Original: "S must be writable if S-mode is implemented, U must be writable if U-mode is implemented, and LCOFIFRZ must be writable if the Smcofpmf/Sscofpmf extension is implemented." Suggestion: "The S field must be writable if S-mode is implemented. Similarly, U must be writable if U-mode is implemented, and LCOFIFRZ must be writable if the Smcofpmf/Sscofpmf extension is implemented."

Original: "Software may opt to use a depth less than the maximum supported in order to reduce the latency of saving and restoring CTR state, or to emulate the maximum depth supported by other implementations, e.g., in cases of VM-migration." Suggestion: "Software may choose to use a depth less than the maximum supported to reduce the latency of saving and restoring CTR state, or to emulate the maximum depth supported by other implementations, for example, during VM migration."

Original: "When reducing CTR depth, by writing mctrcontrol.DEPTH to a smaller value, software should set mctrcontrol.CLR. This ensures that no transfer state is retained in the now-inaccessible entries above the new depth value."

What is the intent of this guideline to clear the inaccessible entries. Is it to allow implementations freedom to power down the inaccessible entries? What harm arises from not clearing the inaccessible entries?

"vsctrcontrol.DEPTH is a read-only copy of sctrcontrol.DEPTH" See comment about requiring all lesser values to be also supported. Original: "The sctrstatus register provides access to CTR status information, and is updated by the hardware when CTR is active (in an enabled privilege mode and not frozen)." Suggestion: "The sctrstatus register grants access to CTR status information and is updated by the hardware whenever CTR is active, which is when it's in an enabled privilege mode and not frozen."

WRPTR - specify behavior of legal values that WRPTR can assume when DEPTH value is changed. Suggest that bits 7:(7 - DEPTH+4) assume a value of 0 and DEPTH+3:0 assume an unspecified but legal value when DEPTH field is written. The statement "wraps on increment" should be expanded to state "wraps to assume a value of 2^(DEPTH+4) - 1.

"Logical entry 0, accessed via mireg* when miselect=0x200, is always the physical entry preceding the WRPTR entry ((WRPTR-1) % depth)." should this be CTR entry (WRPTR-1) % depth? Further since lower case depth is used to imply 2^(DEPTH+4) - it may be good to be explicit about that.

Original: "Because the sctrstatus register is updated by hardware, writes should be performed with caution." Suggestion: "Since the sctrstatus register is updated by hardware, writes to it should be performed cautiously."

Original: "If a multi-instruction read-modify-write to sctrstatus is performed while CTR is active, such that a qualified transfer, or trap that causes CTR freeze, completes between the read and the write, a hardware update could be lost." Suggestion: "If a multi-instruction read-modify-write operation is performed on sctrstatus while CTR is active, and a qualified transfer or a trap that causes CTR to freeze occurs between the read and write stages, a hardware update might be lost."

Should there be a FREEZE-CLEAR control bit?

Original: "A future extension could add support for RV32, by adding 3 new CSRs (mctrcontrolh, sctrcontrolh, and vsctrcontrolh) to provide this access." Suggestion: "A future extension could add support for RV32 by adding 3 new CSRs (mctrcontrolh, sctrcontrolh, and vsctrcontrolh) to provide this access."

Original: "The miselect index range 0x200..0x2FF is reserved for CTR entries 0..255. When miselect holds a value in this range, mireg provides access to ctrsource, mireg2 provides access to ctrtarget, and mireg[3456] provide access ctrdata." Suggestion: Consider using 0 through 255. For mireg[3456] - mireg[3/4/5/6].

Original: "ctrsource is an MXLEN-bit WARL register that must be able to hold all valid virtual addresses. It need not be able to hold an invalid address. When XLEN < MXLEN, software access via *ireg will access only the lower XLEN bits of ctrsource, and implict writes (by recorded transfers) will be zero-extended."

Suggestion: implict -> implicit (also in ctrtarget description) Does "software access" imply only read and not writes? So while implicit writes do a zero extend but explicit CSR writes will be required to preserve the MXLEN-1:XLEN bits? Why not zero extend for CSR writes to avoid needing a read-modify-write in hardware.

It may be good to treat this similar to *epc i.e. must hold all valid addresses but need not hold all invalid addresses. "It need not be capable of holding all possible invalid addresses. Prior to writing ctrsource, implementations may convert an invalid address into some other invalid address that ctrsource is capable of holding." With that this note can be dropped "A transfer from an invalid address (which could only occur on an exception) may report a valid address in ctrsource.PC." There is one case where a jump to an invalid address would record the invalid address on the trap caused by the instruction access-fault/(guest)page-fault trap.

Add a informational note that when address translation is not in effect, virtual and physical address are equal and so the set of addresses that can be held in ctrsource must include the set of physical addresses that can be used as a valid pc.

Use lower case pc with emphasis.

Add that on implementations that support only IALIGN=32, the bit 1 of ctrsource is also read only 0.

Similar comments about ctrtarget.

Since CTR is not supported for XLEN 32, the indirect alias CSRs table can be updated to remove the XLEN=32 column.

Add a non-normative note for rationale for metadata being a 256 bit register.

Original: "If the H extension is implemented and mstateen0.CTR=1, the hstateen0.CTR bit controls access to supervisor CTR state (sctrcontrol, sctrstatus, and sireg* when siselect is in 0x200..0x2FF) when V=1."

Should add a note that when mstateen0.CTR is 0, hstateen0.CTR is appears to be read-only and 0.

Original: "When mstateen0.CTR=1 and hstateen0.CTR=1, VS-mode accesses to supervisor CTR state behave as described in CSRs and Entry Registers above. When mstateen0.CTR=1 and hstateen0.CTR=0, VS- mode accesses to supervisor CTR state raise a virtual instruction exception.

With that, the qualification - "and hstateen0.CTR" is not required.

Original: "When hstateen0.CTR=0, qualified control transfers executed while V=1 will continue to implicitly update Entry Registers and sctrstatus." Suggestion: "...sctrstatus (really vsctrstatus)" Original: "Such qualified transfers update the Entry Registers at logical entry 0, such that older entries are pushed down the stack (the record previously in entry 0 is pushed to entry 1, the record previously in entry 1 is pushed to entry 2, etc)." Suggested: "Such qualified transfers update the Entry Registers at logical entry 0. As a result, older entries are pushed down the stack: the record previously in entry 0 moves to entry 1, the record in entry 1 moves to entry 2, and so on."

Section 5.1 - it may be helpful if the second paragraph was split into two paragraphs. One to discuss traps and second to discuss trap returns.

"The ctrdata register may optionally include a count of CPU cycles elapsed since the prior CTR record."

Is this intended to be identical to cycles recorded in mcycles. If so then that should be spelt out.

"An implementation must clear CCV for the next recorded transfer upon a write to [ms]ctrcontrol" On all writes?

Freeze section should include the vsctrcontrol descrition for lcofi/breakpoints that occur in V=1.

Custom extensions - could they have used a custom indirect CSR? Is there a major value in reserving 2 bits for custom extensions in ctrcontrol? It seems best to make all bits in control, sctrstatus and entries as WPRI. Custom extensions can add a bank of registers using the select values designated for custom extensions.

bcstrongx commented 7 months ago

Agreed upon changes so far:

1) Eliminate ETEN bit and alter external traps to require TE bits for target mode plus all intermediate modes to be set. 2) Switch to allowing implementations to choose a max and min depth supported, to accommodate fixed depth seen from VS-mode 3) Add non-normative note about clearing current mode bit before xRET, to ensure no privileged PC leak 4) Add non-normative note about not defining all depth encodings in order to save indirect CSR space 5) TBD 6) Make hstateen0.CTR ROZ when mstateen0.CTR is ROZ 7) TBD 8) Change requirement that custom bits have reset value 0 to have reset value associated with all custom extensions disabled. Consider increasing the number of custom bits (TBD). 9) Specify that WRPTR has an unspecified but legal value on depth change. 10) Specify that entries made available upon increasing depth hold an unspecified but legal value. 11) Replace "When XLEN < MXLEN, software access via ireg will access only the lower XLEN bits of ctrsource, and implict writes (by recorded transfers) will be zero-extended" to say that software writes also zero-extend 12) Require ctrsource/ctrtarget to hold all valid virtual or physical addresses, but not any invalid addresses 13) Do nothing 14) TBD 15) Clarify that the cycle counter uses the same clock rate as mcycle 16) Clarify that BPFRZ/LCOFIFRZ from vsctrcontrol applies when V=1, [ms]ctrcontrol when V=0

ved-rivos commented 7 months ago

Make hstateen0.CTR ROZ when mstateen0.CTR is ROZ

Should be: hstateen0.CTR appears as read-only zero when mstateen0.CTR is 0.

This follows from Smstateen: For every bit in an mstateen CSR that is zero (whether read-only zero or set to zero), the same bit appears as read-only zero in the matching hstateen and sstateen CSRs.

bcstrongx commented 7 months ago

Yes, that's what I meant to say.

bcstrongx commented 7 months ago
  1. Agreed that a FREEZE-CLR mechanism isn't warranted, but will add non-normative guidance on how to avoid race conditions on RMW to sctrstatus
bcstrongx commented 7 months ago
  1. TG agreed to make ctrdata 64 bits
bcstrongx commented 7 months ago
  1. ARC agreed with the value of CLR, but recommends a new instruction rather than a CSR command bit. Awaiting opcode info from Andrew.
bcstrongx commented 7 months ago

Additional email feedback from Ved, after resolution of the above:

  1. Section 4.1 Add to description:

    1. CTRCLEAR is not valid in U-mode or VU-mode.
    2. If mstateen0.CTR is 0, attempts to execute CTRCLEAR at privilege modes less than M raise an illegal-instruction exception.
    3. If hstateen0.CTR is 0, attempts to execute CTRCLEAR in VS-mode raise a virtual instruction exception.

      "If CTR is not active, any read of ctrsource, ctrtarget, ctrdata, or sctrstatus that follows CTRCLEAR will return the value 0" Why qualifying with "If CTR is not active"?

  2. Page 7: "When reducing CTR depth, software should execute CTRCLEAR. This ensures that newly accessinble.. " Should it when increasing depth? Reducing depth only makes entries inaccessible. Or was this intended to mean that "When later the depth is increased, the newly accessible..." - but this was already established by "newly accessible entries contain unspecified but legal values". The motivation for this recommendation is unclear.

  3. Fig 2. The field names for vsctrcontrol and sctrcontrol be made identical (VU->U, VS->S, and VSTE->STE). With that those can be omitted from Table 2 as the statement about matching definition of sctrcontrol applies.

  4. Fig 4/5. Caption should additionally say " for MXLEN=64"

Nits:

  1. MTE/STE - the ", if any...." piece can be left out of the table (similar to VSTE) - as it is explained in section 6.2.1.

  2. "on each V transition" terminology is new... Maybe "transitions between HS-mode and VS/VU-mode"

  3. sireg/vsireg -> siregx/vsiregx

  4. Sometime cross references use section number and sometimes use section heading. May be good to unify to use section number. Backticks are missed in some instances

  5. Caption of Table 4 and table 4 are on separate pages.

  6. Are any of the warning blocks pending discussion?

bcstrongx commented 7 months ago

Resolution of the above:

  1. Fixed
  2. Non-normative comment clarified to cover avoiding side or covert channels using inaccessible upper CTR entries
  3. Fixed
  4. Fixed
  5. Fixed
  6. Fixed, though use host and guest terminology
  7. Agreed to keep as-is, consistent with the S*csrind spec
  8. Fixed
  9. Fixed
  10. Removed several warning blocks, converted others to NOTEs
bcstrongx commented 7 months ago

Additional changes agreed upon via email:

  1. Change *ctrcontrol to _x_ctrcontrol
  2. Make undefined bits in control CSRs WPRI instead of 0, ala xenvcfg bits
  3. Remove requirements on WARL behavior for DEPTH field. Remove associated Discovery chapter, as WARL-based discovery is already understood.
  4. Remove RV32 access to ctrdata[63:32] (which has no defined bits) via mireg4. RV32 is unsupported.
  5. Removed warning block regarding indication in the ACPI spec regarding retention of CTR state across low-power state(s). Instead filed an issue on the FFH spec: https://github.com/riscv-non-isa/riscv-acpi-ffh/issues/9
  6. Clarified the meaning of "intervening modes" w.r.t. external trap enabling, and added a table detailing all scenarios
  7. Add requirement that "qualified transfers" must complete/retire

Other various text or clarification improvements added.