Open rocallahan opened 6 years ago
@rocallahan Is this change still being considered?
I have modified rr locally to use counter 0x5120c4 (BR_INST_RETIRED.NEAR_TAKEN) and the results are pretty good:
The following tests FAILED:
1434 - mmap_shared_dev_zero-32 (Failed)
1435 - mmap_shared_dev_zero-32-no-syscallbuf (Failed)
2 tests have failed, but on my machine the same tests are failing even with unpatched rr, so the failures are probably unrelated to the counter change.
If you pull from master those tests are probably fixed.
That does sound pretty good. Can you do several runs of the test suite to make sure? Is there any particular reason why you tried this?
I pulled from master, ran the test suite 5 times with patched rr and there were no failures this time.
The main reasson I tried this was because I have read issue #2263, thought about it, and came to the conclusion that counting only conditional branches is technically incorrect and the only correct solution is to count all taken branches.
That being said, it looks like in real world programs indistinguishable states are pretty rare. I'm guessing this is the reason why this issue hasn't been fixed yet.
It sounds like we could take this change, at least for your microarchitecture (which one?), and it would probably make rr work a little better. We don't know yet how reliable this is across microarchitecture revs; I haven't even checked where it's available. I know when we started out with Sandy Bridge this didn't work.
Taking this change would mean that we create some cases where traces recorded on some microarchitecture(s) can't be replayed on other microarchitectures. That's unfortunate. I guess it's probably still worth doing, but we definitely would need to extend PerfCounters
so that CPUs that can use both RCB and taken-branches can replay either kind of trace.
Right now the PMU_TICKS_TAKEN_BRANCHES
flag is a bit of a hack; it indicates that PmuConfig::rcb_cntr_event
actually means taken branches IF we subtract the value of PmuConfig::minus_ticks_cntr_event
. So I think what we need to do is break out event definitions from PmuConfig
, e.g.
struct EventConfig {
unsigned primary_event; // 0 means event not supported
unsigned subtract_event; // 0 means subtraction not needed
uint32_t skid_size;
};
struct PmuConfig {
CpuMicroarch uarch;
const char* name;
EventConfig retired_conditional_branches_event;
EventConfig taken_branches_event;
uint32_t flags;
};
something like that?
I think I'd accept a PR like this.
I am assuming that the existing TicksSemantics
TICKS_TAKEN_BRANCHES
means the same thing on AMDF15R30
as your BR_INST_RETIRED.NEAR_TAKEN
... and that it means the same thing across Intel microarchitecture revs. I hope that's true!
This would avoid issues like #2263 where we get into identical-looking program states.
We could try something like @pipcet did for AMD Bulldozer, using the difference of two performance counters to get a reliable value.
Another possibility is to use Intel PT to track control flow. We might not be able to trigger interrupts for replay but that might be OK since the counter for interrupts doesn't need to be as reliable.