Reading through the spec doc (here), a few issues came to mind:
1) To avoid unnecessary duplication, for all events that include both RET and SPEC flavors, I'd list only the RET events, and precede the table with text that says something like:
The following non-speculative (*.RET) events also have speculative (*.SPEC) variants. The speculative versions include events that occur as a result of instructions or uops that do not retire.
2) I think we should remove example formulas for events we suspect an implementation may opt to use from the tables and event files. We can add non-normative text if we believe these formula methods should be broadcast.
3) I find the RETIRE and SPEC groups confusing, since there are RET and SPEC events in many groups. What if we moved the events in the RETIRE and SPEC groups into GEN? Or preferably GENERAL?
4) The PKI/MPKI metrics aren't clear on the order of operations. For instance, CTRL_FLOW.PKI has the following formula:
CTRL_FLOW.RET / INST.RET * 1000
It should read:
CTRL_FLOW.RET / (INST.RET * 1000)
5) I'd add the FLUSH.* events to GENERAL as well. They aren't really CTRL_FLOW events.
6) Our CACHE.* events count load/store instructions. But some instructions perform multiple accesses, and we don't have any way to count all accesses. I would recommend moving to the following:
* INST.{LOAD|STORE}.{L1D|L2|L3|...}.*.{RET|SPEC} - counts load/store instructions
* CACHE.{L1D|L2|L3}.{LOAD|STORE|PREF|SNOOP}.* - counts cache accesses (may be multiple per instruction), including speculative and implicit accesses (like prefetches, page walk loads, etc). Do we need a PGWALK access type too?
This way the CACHE events are from the perspective of the associated cache, which knows nothing about instructions or retirement, while the load/store instruction events are part of the INST events. These new INST events could go in a new LOAD_STORE group.
Actually, if we do this then, for consistency, we should probably name the CTRL_FLOW events INST.CTRL_FLOW too. So that all events that count instructions are INST.* events.
7) Shouldn't CACHE.SNOOP.REMOTE_REQ_LOCAL_HITM be per cache level?
8) I think we should include TLB.L.CODE.RET events. They are more costly to implement, but useful. Akin to Intel's FRONTEND_RETIRED.
9) Problem with the formula for TOPDOWN.BACKEND_BOUND.MEMORY_BOUND.DATA_BOUND.L2_BOUND.RATE:
The RET and SPEC events differ in description as well. As I understand with this proposal you assume that implementors will themselves adopt description for SPEC variants? Even though this adoption looks trivial I'm not sure this is a good direction. I personally like the current explicit version more. Yes it has some duplication, but it is simple, clear and has one-to-one relation with actual event files the users will be dealing with.
The correct formula is CTRL_FLOW.RET / (INST.RET / 1000) - which is identical to CTRL_FLOW.RET / INST.RET 1000. We can use 1000 CTRL_FLOW.RET / INST.RET or CTRL_FLOW.RET / (INST.RET / 1000) variants to avoid such confusions.
INST.{LOAD|STORE}.{L1D|L2|L3|...}..{RET|SPEC} - probably the SPEC variants are redundant considering that we'll have CACHE ones as well? We may have only RET ones I think. Also when counting events on cache IP block level it is probably more natural to operate with READ/WRITE terms instead of LOAD/STORE? E.g. a store instruction can cause read transaction to certain cache level.
From the perf analysis point of view knowing cache level here is not important I think. On the other side different cache levels will count this event independently so may be yes, such breakdown may be needed.
See https://github.com/riscv/riscv-performance-events/pull/12
I'm aware that some events are SPEC only, but for those that have SPEC and RET flavors, are there differences beyond the RET versions only count events incurred by uops that retire, and the SPEC versions count all? If so I agree we should keep them separate.
You're right, not sure why I was trying to multiply INST.RET * 1000. But I agree that putting 1000 in front is more clear.
The difference between INST.{LOAD|STORE} and CACHE.{LOAD|STORE} is that the former only increments once per instruction, even for a vector operation that may perform many cache accesses, while the latter counts all accesses. My instinct is that an implementation would only support the INST.{LOAD|STORE}*.RET (for purposes of precise attribution) and the CACHE.{LOAD|STORE} (speculative) events, but in the interest of simplicity I would include SPEC versions for all the RET events.
Reading through the spec doc (here), a few issues came to mind:
1) To avoid unnecessary duplication, for all events that include both RET and SPEC flavors, I'd list only the RET events, and precede the table with text that says something like:
2) I think we should remove example formulas for events we suspect an implementation may opt to use from the tables and event files. We can add non-normative text if we believe these formula methods should be broadcast.
3) I find the RETIRE and SPEC groups confusing, since there are RET and SPEC events in many groups. What if we moved the events in the RETIRE and SPEC groups into GEN? Or preferably GENERAL?
4) The PKI/MPKI metrics aren't clear on the order of operations. For instance, CTRL_FLOW.PKI has the following formula:
It should read:
5) I'd add the FLUSH.* events to GENERAL as well. They aren't really CTRL_FLOW events.
6) Our CACHE.* events count load/store instructions. But some instructions perform multiple accesses, and we don't have any way to count all accesses. I would recommend moving to the following:
7) Shouldn't CACHE.SNOOP.REMOTE_REQ_LOCAL_HITM be per cache level?
8) I think we should include TLB.L.CODE.RET events. They are more costly to implement, but useful. Akin to Intel's FRONTEND_RETIRED.
9) Problem with the formula for TOPDOWN.BACKEND_BOUND.MEMORY_BOUND.DATA_BOUND.L2_BOUND.RATE: