Try counting all taken branches on Intel

rr-debugger / rr

Record and Replay Framework

http://rr-project.org/

Other

9.19k stars 585 forks source link

Try counting all taken branches on Intel #2284

Open rocallahan opened 6 years ago

rocallahan commented 6 years ago

This would avoid issues like #2263 where we get into identical-looking program states.

We could try something like @pipcet did for AMD Bulldozer, using the difference of two performance counters to get a reliable value.

Another possibility is to use Intel PT to track control flow. We might not be able to trigger interrupts for replay but that might be OK since the counter for interrupts doesn't need to be as reliable.

phantom10111 commented 4 years ago

@rocallahan Is this change still being considered?

I have modified rr locally to use counter 0x5120c4 (BR_INST_RETIRED.NEAR_TAKEN) and the results are pretty good:

The following tests FAILED:
    1434 - mmap_shared_dev_zero-32 (Failed)
    1435 - mmap_shared_dev_zero-32-no-syscallbuf (Failed)

2 tests have failed, but on my machine the same tests are failing even with unpatched rr, so the failures are probably unrelated to the counter change.

rocallahan commented 4 years ago

If you pull from master those tests are probably fixed.

That does sound pretty good. Can you do several runs of the test suite to make sure? Is there any particular reason why you tried this?

phantom10111 commented 4 years ago

I pulled from master, ran the test suite 5 times with patched rr and there were no failures this time.

The main reasson I tried this was because I have read issue #2263, thought about it, and came to the conclusion that counting only conditional branches is technically incorrect and the only correct solution is to count all taken branches.

That being said, it looks like in real world programs indistinguishable states are pretty rare. I'm guessing this is the reason why this issue hasn't been fixed yet.

rocallahan commented 4 years ago

It sounds like we could take this change, at least for your microarchitecture (which one?), and it would probably make rr work a little better. We don't know yet how reliable this is across microarchitecture revs; I haven't even checked where it's available. I know when we started out with Sandy Bridge this didn't work.

Taking this change would mean that we create some cases where traces recorded on some microarchitecture(s) can't be replayed on other microarchitectures. That's unfortunate. I guess it's probably still worth doing, but we definitely would need to extend PerfCounters so that CPUs that can use both RCB and taken-branches can replay either kind of trace.

rocallahan commented 4 years ago

Right now the PMU_TICKS_TAKEN_BRANCHES flag is a bit of a hack; it indicates that PmuConfig::rcb_cntr_event actually means taken branches IF we subtract the value of PmuConfig::minus_ticks_cntr_event. So I think what we need to do is break out event definitions from PmuConfig, e.g.

struct EventConfig {
  unsigned primary_event; // 0 means event not supported
  unsigned subtract_event; // 0 means subtraction not needed
  uint32_t skid_size;
};
struct PmuConfig {
  CpuMicroarch uarch;
  const char* name;
  EventConfig retired_conditional_branches_event;
  EventConfig taken_branches_event;
  uint32_t flags;
};

something like that?

I think I'd accept a PR like this.

I am assuming that the existing TicksSemantics TICKS_TAKEN_BRANCHES means the same thing on AMDF15R30 as your BR_INST_RETIRED.NEAR_TAKEN ... and that it means the same thing across Intel microarchitecture revs. I hope that's true!