Closed joneschrisg closed 2 years ago
As of valgrind 3.9.0, the valgrind CPU still reports itself as merom. The release notes claim to support AVX2 now, which is way up in haswell land. But crucially, AVX2 is only supported for x86-64. Sigh. The other sandy bridge \ merom features may be gated on x86-64 too.
But we have another option: merom has a deterministic store-insn counter according to this paper. It would be pretty straightforward to add that support, but not sure it's worth it yet.
The other sandy bridge \ merom features may be gated on x86-64 too.
Valgrind reports itself as sandy bridge on x86-64 machines that have AVX, but not on x86. Looks like x86 support is falling behind.
On second thought, rr's "syscall injection" is technically self-modifying code, which voids valgrind's warranty. But I somewhat suspect (hope) that things will just work, and indeed that'll be a good test of whether rr is cleaning up after itself properly in tracee tasks.
Yet another problem is that valgrind-instrumented code longjmp's out of some signal handlers, which rr doesn't know how to deal with.
CC @rocallahan
This bug is about running valgrind tools on tracee code, right? Whereas for uninitialized fields in rr itself, you'd just need to run valgrind on rr itself, which should just work, I'd have thought...
I'd like to have both. I agree it's worth treating those problems separately.
valgrind doesn't work on even just rr out of the box, which is what I care about more, because per discussion above valgrind's emulated CPU identifies itself as Merom and rr barfs. ISTR force-overriding the detected microarch, but something broke. I'd like to try again.
Now that I think about it, for the rr-only case, we can have it launch a subprocess to make a non-emulated cpuid call, and then users wouldn't even have to force their microarch. (Sorry Julian!)
ISTR force-overriding the detected microarch, but something broke.
The reason is that libpfm does its own CPUIDs to decide on the encoding of the raw microarch-specific events that rr chooses based on its own CPUID. This isn't how abstractions are supposed to work. So we'll need #974 to get libpfm out of the way before we can work around the CPUID emulation.
An interesting question is what valgrind does with perf_event_open attributes. If the syscall is passed through, then clients can observe CPU emulation by passing raw events that don't match the emulated microarch. But I doubt anyone cares.
After #1274, here's the status
rr -a 'ivy bridge'
(replacing ivy bridge
with your CPU's uarch)write(-1)
to check the rcb_cntr. I'm pretty sure this is valgrind warning us about writing to fd -1!! Helpful, but annoying in this case. Run rr as rr -a 'ivy bridge' -f
to ignore that unexpected write.So with valgrind patched to support kcmp, I think we're good to go on the tracer-side checking, which is the more useful of the two.
Filed a valgrind bug for kcmp support.
The valgrind patch landed. The last to-do item is exec'ing tracees more quickly, in order to "detach" valgrind. Currently in replay, valgrind observes tid values change when the rr fork child switches from "real" execution mode to using emulated data from the trace. I think the easiest thing to do might be to add a "hidden" rr command, rr __launch /foo --args
or something like that, which we can exec immediately after the fork and that prepares the child process for replay.
The problem referenced above may still exist, but we have another problem to deal with: the image after valgrind starts running rr doesn't have a vdso image mapped. I assume this is because valgrind's dynamic linker resolves references to vdso symbols to valgrind helpers. To work around this, we could include code to make traced syscalls in the rr image itself, at a known offset in the binary. Then, if we don't find the vdso, we can use the rr code. But that's pretty tricky a bit a delicate, so I don't think it's worth it at this point.
Status update after #1796:
valgrind rr record
seems to work fine, but valgrind rr replay
still has issues:
valgrind ./bin/rr replay -a
==6829== Memcheck, a memory error detector
==6829== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6829== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==6829== Command: ./bin/rr replay -a
==6829==
rr: Warning: You appear to be running in a VMWare guest with a bug
where a conditional branch instruction between two CPUID instructions
sometimes fails to be counted by the conditional branch performance
counter. Work around this problem by adding
monitor_control.disable_hvsim_clusters = true
to your .vmx file.
--6829-- WARNING: unhandled amd64-linux syscall: 446
--6829-- You may be able to write your own handler.
--6829-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--6829-- Nevertheless we consider this a bug. Please report
--6829-- it at http://valgrind.org/support/bug_reports.html.
[FATAL /home/keno/rr-vanilla/src/ReplaySession.cc:454:check_pending_sig() errno: SUCCESS]
(task 6830 (rec:6820) at time 1310)
-> Assertion `0 < t->stop_sig()' failed to hold. Replaying `SCHED': expecting tracee signal or trap, but instead at `open' (ticks: 145293624)
Launch gdb with
gdb '-l' '-1' '-ex' 'target extended-remote :6830' /home/keno/.local/share/rr/latest-trace/mmap_hardlink_2_julia
Do I understand correctly that running valgrind rr replay -a
does the same checks as running valgrind on the application itself?
Basically, this use case is a massive speed boost for valgrind?
No. If you do that, Valgrind will not check the application.
It is possible to implement binary instrumentation of the replay, but the approach has to be a bit different. We've actually implemented this in a closed-source branch. And yes, we could use this to implement a memcheck-during-replay tool. Unfortunately we have to hold onto this code until we figure out a business model to make this work sustainable.
I'm very interested in being able to apply some type of binary instrumentation during the replay. Any plans on open-sourcing the branch with that, or anyone else interested in looking into reimplementing something like this with me?
For what it is worth, rr
works just fine with instrumented -fsanitize=…
code. For example, you can continue
to a heap-use-after-free
report of AddressSanitizer, set a watchpoint on the ASAN shadow address, and reverse-continue
to find where the memory was freed. For MemorySanitizer it is a little trickier, because the MSAN output does not mention shadow byte addresses. Origin tracking is your friend.
I believe that the combination of ASAN and MSAN is equivalent or superior to Valgrind’s default memcheck
tool, and debugging multi-threaded code is much easier than with valgrind --vgdb=yes
.
MemorySanitizer does involve some additional effort, because all linked code (except libc
) must be compiled with clang -fsanitize=memory
, or otherwise you will get reports that any memory that was initialized by uninstrumented code is uninitialized. For C++ programs, you have to build and use an instrumented libc++
instead of libstdc++
.
Yeah rr + the various sanitizers is pretty powerful, and I don't think there's a ton of need to ever deal with valgrind proper.
Time to let this issue die.
It appears that valgrind identifies itself as Merom, which apparently isn't supported yet. I'm testing with distro-supplied valgrind 3.7.0, so things may be different in later valgrind. If not, then we can either add Merom support to rr, or add later support for a newer architecture to valgrind.
The question of how well rr+valgrind could work is also interesting. Needs some thought. But even basic checking allowed me to diagnose #15.