joneschrisg commented 11 years ago

It appears that valgrind identifies itself as Merom, which apparently isn't supported yet. I'm testing with distro-supplied valgrind 3.7.0, so things may be different in later valgrind. If not, then we can either add Merom support to rr, or add later support for a newer architecture to valgrind.

The question of how well rr+valgrind could work is also interesting. Needs some thought. But even basic checking allowed me to diagnose #15.

joneschrisg commented 10 years ago

As of valgrind 3.9.0, the valgrind CPU still reports itself as merom. The release notes claim to support AVX2 now, which is way up in haswell land. But crucially, AVX2 is only supported for x86-64. Sigh. The other sandy bridge \ merom features may be gated on x86-64 too.

But we have another option: merom has a deterministic store-insn counter according to this paper. It would be pretty straightforward to add that support, but not sure it's worth it yet.

joneschrisg commented 10 years ago

The other sandy bridge \ merom features may be gated on x86-64 too.

Valgrind reports itself as sandy bridge on x86-64 machines that have AVX, but not on x86. Looks like x86 support is falling behind.

joneschrisg commented 10 years ago

606 is a way to get valgrind working.

joneschrisg commented 10 years ago

On second thought, rr's "syscall injection" is technically self-modifying code, which voids valgrind's warranty. But I somewhat suspect (hope) that things will just work, and indeed that'll be a good test of whether rr is cleaning up after itself properly in tracee tasks.

joneschrisg commented 10 years ago

Yet another problem is that valgrind-instrumented code longjmp's out of some signal handlers, which rr doesn't know how to deal with.

joneschrisg commented 10 years ago

1262 is YET ANOTHER annoying bug caused by uninitialized C++ object fields. We deal with enough really nasty bugs in rr that this class is just insulting! Will probably take a bit of time soon to retry valgrind'ing rr.

CC @rocallahan

rocallahan commented 10 years ago

This bug is about running valgrind tools on tracee code, right? Whereas for uninitialized fields in rr itself, you'd just need to run valgrind on rr itself, which should just work, I'd have thought...

joneschrisg commented 10 years ago

I'd like to have both. I agree it's worth treating those problems separately.

valgrind doesn't work on even just rr out of the box, which is what I care about more, because per discussion above valgrind's emulated CPU identifies itself as Merom and rr barfs. ISTR force-overriding the detected microarch, but something broke. I'd like to try again.

Now that I think about it, for the rr-only case, we can have it launch a subprocess to make a non-emulated cpuid call, and then users wouldn't even have to force their microarch. (Sorry Julian!)

joneschrisg commented 10 years ago

ISTR force-overriding the detected microarch, but something broke.

The reason is that libpfm does its own CPUIDs to decide on the encoding of the raw microarch-specific events that rr chooses based on its own CPUID. This isn't how abstractions are supposed to work. So we'll need #974 to get libpfm out of the way before we can work around the CPUID emulation.

An interesting question is what valgrind does with perf_event_open attributes. If the syscall is passed through, then clients can observe CPU emulation by passing raw events that don't match the emulated microarch. But I doubt anyone cares.

joneschrisg commented 10 years ago

After #1274, here's the status

valgrind still implements only up to Merom for 32-bit processes, as of valgrid r14237. Workaround is to run rr as rr -a 'ivy bridge' (replacing ivy bridge with your CPU's uarch)
early in startup, valgrind makes an unexpected write when rr is expecting the write(-1) to check the rcb_cntr. I'm pretty sure this is valgrind warning us about writing to fd -1!! Helpful, but annoying in this case. Run rr as rr -a 'ivy bridge' -f to ignore that unexpected write.
soon after that, rr aborts because valgrind doesn't implement kcmp. Requires a valgrind fix.

So with valgrind patched to support kcmp, I think we're good to go on the tracer-side checking, which is the more useful of the two.

joneschrisg commented 10 years ago

Filed a valgrind bug for kcmp support.

joneschrisg commented 10 years ago

The valgrind patch landed. The last to-do item is exec'ing tracees more quickly, in order to "detach" valgrind. Currently in replay, valgrind observes tid values change when the rr fork child switches from "real" execution mode to using emulated data from the trace. I think the easiest thing to do might be to add a "hidden" rr command, rr __launch /foo --args or something like that, which we can exec immediately after the fork and that prepares the child process for replay.

joneschrisg commented 9 years ago

The problem referenced above may still exist, but we have another problem to deal with: the image after valgrind starts running rr doesn't have a vdso image mapped. I assume this is because valgrind's dynamic linker resolves references to vdso symbols to valgrind helpers. To work around this, we could include code to make traced syscalls in the rr image itself, at a known offset in the binary. Then, if we don't find the vdso, we can use the rr code. But that's pretty tricky a bit a delicate, so I don't think it's worth it at this point.

Keno commented 8 years ago

Status update after #1796: valgrind rr record seems to work fine, but valgrind rr replay still has issues:

valgrind ./bin/rr replay -a
==6829== Memcheck, a memory error detector
==6829== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6829== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==6829== Command: ./bin/rr replay -a
==6829==

rr: Warning: You appear to be running in a VMWare guest with a bug
    where a conditional branch instruction between two CPUID instructions
    sometimes fails to be counted by the conditional branch performance
    counter. Work around this problem by adding
        monitor_control.disable_hvsim_clusters = true
    to your .vmx file.

--6829-- WARNING: unhandled amd64-linux syscall: 446
--6829-- You may be able to write your own handler.
--6829-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--6829-- Nevertheless we consider this a bug.  Please report
--6829-- it at http://valgrind.org/support/bug_reports.html.
[FATAL /home/keno/rr-vanilla/src/ReplaySession.cc:454:check_pending_sig() errno: SUCCESS]
 (task 6830 (rec:6820) at time 1310)
 -> Assertion `0 < t->stop_sig()' failed to hold. Replaying `SCHED': expecting tracee signal or trap, but instead at `open' (ticks: 145293624)
Launch gdb with
  gdb '-l' '-1' '-ex' 'target extended-remote :6830' /home/keno/.local/share/rr/latest-trace/mmap_hardlink_2_julia

VelorumS commented 7 years ago

Do I understand correctly that running valgrind rr replay -a does the same checks as running valgrind on the application itself?

Basically, this use case is a massive speed boost for valgrind?

rocallahan commented 7 years ago

No. If you do that, Valgrind will not check the application.

It is possible to implement binary instrumentation of the replay, but the approach has to be a bit different. We've actually implemented this in a closed-source branch. And yes, we could use this to implement a memcheck-during-replay tool. Unfortunately we have to hold onto this code until we figure out a business model to make this work sustainable.

clevcode commented 5 years ago

I'm very interested in being able to apply some type of binary instrumentation during the replay. Any plans on open-sourcing the branch with that, or anyone else interested in looking into reimplementing something like this with me?

dr-m commented 2 years ago

For what it is worth, rr works just fine with instrumented -fsanitize=… code. For example, you can continue to a heap-use-after-free report of AddressSanitizer, set a watchpoint on the ASAN shadow address, and reverse-continue to find where the memory was freed. For MemorySanitizer it is a little trickier, because the MSAN output does not mention shadow byte addresses. Origin tracking is your friend.

I believe that the combination of ASAN and MSAN is equivalent or superior to Valgrind’s default memcheck tool, and debugging multi-threaded code is much easier than with valgrind --vgdb=yes.

MemorySanitizer does involve some additional effort, because all linked code (except libc) must be compiled with clang -fsanitize=memory, or otherwise you will get reports that any memory that was initialized by uninstrumented code is uninitialized. For C++ programs, you have to build and use an instrumented libc++ instead of libstdc++.

khuey commented 2 years ago

Yeah rr + the various sanitizers is pretty powerful, and I don't think there's a ton of need to ever deal with valgrind proper.

Time to let this issue die.

rr-debugger / rr

Get rr working with valgrind (or vice versa), as far as possible #16

606 is a way to get valgrind working.

1262 is YET ANOTHER annoying bug caused by uninitialized C++ object fields. We deal with enough really nasty bugs in rr that this class is just insulting! Will probably take a bit of time soon to retry valgrind'ing rr.