paulfloyd / freebsd_valgrind

Git repo used to Upstream the FreeBSD Port of Valgrind
GNU General Public License v2.0
15 stars 4 forks source link

valgrind [clang-11.0 (llvm-devel-11.0.d20200327) FreeBSD x86] fails in initimg: mmap(0x400000, 4096) failed in UME with error 22 #95

Closed nbriggs closed 4 years ago

nbriggs commented 4 years ago

compiling valgrind with clang-11.0, at least the FreeBSD version from llvm-devel-11.0.d20200327, produces a non-functional valgrind -- all attempts to use it fail with

$ valgrind /usr/bin/hash
valgrind: mmap(0x400000, 24576) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
$ valgrind /usr/bin/true
valgrind: mmap(0x400000, 4096) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

Running with debug/verbose flags:


$ valgrind -d -v -v /usr/bin/true
--7735:1:debuglog DebugLog system started by Stage 1, level 1 logging requested
--7735:1:launcher no tool requested, defaulting to 'memcheck'
--7735:1:launcher selected platform 'x86-freebsd'
--7735:1:launcher launching /usr/local/lib/valgrind/memcheck-x86-freebsd
--7735:1:debuglog DebugLog system started by Stage 2 (main), level 1 logging requested
--7735:1:    main Welcome to Valgrind version 3.16.0.RC1 debug logging
--7735:1:    main Checking current stack is plausible
--7735:1:    main Checking initial stack was noted
--7735:1:    main Starting the address space manager
--7735:1:    main Address space manager is running
--7735:1:    main Starting the dynamic memory manager
--7735:1:mallocfr newSuperblock at 0x4000000 (pszB 4194288)  owner VALGRIND/core
--7735:1:mallocfr deferred_reclaimSuperblock at 0x4000000 (pszB 4194288)  (prev 0x0) owner VALGRIND/core
--7735:1:    main Dynamic memory manager is running
--7735:1:    main Initialise m_debuginfo
--7735:1:    main VG_(libdir) = /usr/local/lib/valgrind
--7735:1:    main Getting launcher's name ...
--7735:1:    main ... /usr/local/bin/valgrind
--7735:1:    main Get hardware capabilities ...
--7735:1:   cache Autodetected cache info is sensible
--7735:1:   cache Cache info:
--7735:1:   cache   #levels = 2
--7735:1:   cache   #caches = 3
--7735:1:   cache      cache #0:
--7735:1:   cache         kind = unified
--7735:1:   cache         level = 2
--7735:1:   cache         size = 2097152 bytes
--7735:1:   cache         linesize = 64 bytes
--7735:1:   cache         assoc = 8
--7735:1:   cache      cache #1:
--7735:1:   cache         kind = insn
--7735:1:   cache         level = 1
--7735:1:   cache         size = 32768 bytes
--7735:1:   cache         linesize = 64 bytes
--7735:1:   cache         assoc = 8
--7735:1:   cache      cache #2:
--7735:1:   cache         kind = data
--7735:1:   cache         level = 1
--7735:1:   cache         size = 32768 bytes
--7735:1:   cache         linesize = 64 bytes
--7735:1:   cache         assoc = 8
--7735:1:    main ... arch = X86, hwcaps = x86-mmxext-sse1-sse2
--7735:1:    main Getting the working directory at startup
--7735:1:    main ... /usr/home/briggs/freebsd_valgrind
--7735:1:    main Split up command line
--7735:1:    main (early_) Process Valgrind's command line options
--7735:1:    main Create initial image
--7735:1: initimg Loading client
valgrind: mmap(0x400000, 4096) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
nbriggs commented 4 years ago

Enabling the aspacem tracing -- first the clang-devel (11.0) compiled valgrind, testing a do-nothing program, and then the clang 9.0 compiled valgrind. Oddly, it doesn't seem to be picking up the test program (/tmp/foo) in the 11.0 case, so it looks as though it's trying to map valgrind (i=3782159) where it should be trying to map /tmp/foo (i=8132)

$ ./vg-in-place /tmp/foo
--31923:0:     ume mmap_file_fixed_client #1
--31923:0: aspacem <<< SHOW_SEGMENTS: after #1 (14 segments)
--31923:0: aspacem 1 segment names in 1 slots
--31923:0: aspacem freelist is empty
--31923:0: aspacem (0,4,7) /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
--31923:0: aspacem   0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--31923:0: aspacem   1: FILE 0000400000-0000400fff    4096 r---- d=0x06b i=3781638 o=0       (0,4)
--31923:0: aspacem   2: RSVN 0000401000-0003ffffff     59m ----- SmFixed
--31923:0: aspacem   3: ANON 0004000000-00043fffff 4194304 rwx--
--31923:0: aspacem   4:      0004400000-0037ffffff    828m
--31923:0: aspacem   5: FILE 0038000000-003814ffff 1376256 r-x-- d=0x06b i=3781638 o=4096    (0,4)
--31923:0: aspacem   6: FILE 0038150000-003820efff  782336 r---- d=0x06b i=3781638 o=1376256 (0,4)
--31923:0: aspacem   7: FILE 003820f000-003820ffff    4096 rw--- d=0x06b i=3781638 o=2154496 (0,4)
--31923:0: aspacem   8: ANON 0038210000-003974cfff     21m rw---
--31923:0: aspacem   9:      003974d000-00fbbfefff   3108m
--31923:0: aspacem  10: ANON 00fbbff000-00ffbdefff     63m -----
--31923:0: aspacem  11: ANON 00ffbdf000-00ffbfefff  131072 rwx--
--31923:0: aspacem  12: ANON 00ffbff000-00ffbfffff    4096 r-x--
--31923:0: aspacem  13: RSVN 00ffc00000-00ffffffff 4194304 ----- SmFixed
--31923:0: aspacem >>>
valgrind: mmap(0x400000, 4096) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
$ 
$ cd ../freebsd_valgrind
$ ./vg-in-place /tmp/foo
--31926:0:     ume mmap_file_fixed_client #1
--31926:0: aspacem <<< SHOW_SEGMENTS: after #1 (14 segments)
--31926:0: aspacem 2 segment names in 2 slots
--31926:0: aspacem freelist is empty
--31926:0: aspacem (0,4,5) /usr/home/briggs/freebsd_valgrind/memcheck/memcheck-x86-freebsd
--31926:0: aspacem (1,72,1) /tmp/foo
--31926:0: aspacem   0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--31926:0: aspacem   1: file 0000400000-0000400fff    4096 r---- d=0x8700ff02 i=8312    o=0       (1,72)
--31926:0: aspacem   2: RSVN 0000401000-0003ffffff     59m ----- SmFixed
--31926:0: aspacem   3: ANON 0004000000-00043fffff 4194304 rwx--
--31926:0: aspacem   4:      0004400000-0037ffefff    827m
--31926:0: aspacem   5: FILE 0037fff000-0037ffffff    4096 r---- d=0x06b i=3452444 o=0       (0,4)
--31926:0: aspacem   6: FILE 0038000000-0038150fff 1380352 r-x-- d=0x06b i=3452444 o=4096    (0,4)
--31926:0: aspacem   7: FILE 0038151000-00381b6fff  417792 r---- d=0x06b i=3452444 o=1384448 (0,4)
--31926:0: aspacem   8: ANON 00381b7000-00396f3fff     21m rw---
--31926:0: aspacem   9:      00396f4000-00fbbfefff   3109m
--31926:0: aspacem  10: ANON 00fbbff000-00ffbdefff     63m -----
--31926:0: aspacem  11: ANON 00ffbdf000-00ffbfefff  131072 rwx--
--31926:0: aspacem  12: ANON 00ffbff000-00ffbfffff    4096 r-x--
--31926:0: aspacem  13: RSVN 00ffc00000-00ffffffff 4194304 ----- SmFixed
--31926:0: aspacem >>>
--31926:0:     ume mmap_file_fixed_client #1
paulfloyd commented 4 years ago

procstat -v is useful for comparison with the aspacemgr output (though slightly annoyingly they don't do the same thing for the ends of the ranges).

I agree that it looks like memcheck is already mapped to the location that the guest is asking to be mapped to.

Here's the procstat output of a 32bit executable that just calls "sleep"

guest:

15749           0x400000           0x401000 r--    1    4   3   1 CN--- vn /usr/home/paulf/scratch/vg_examples/slp32
15749           0x401000           0x402000 r-x    1    4   3   1 CN--- vn /usr/home/paulf/scratch/vg_examples/slp32
15749           0x402000           0x403000 rw-    1    0   2   0 C---- vn /usr/home/paulf/scratch/vg_examples/slp32
15749           0x403000           0x404000 r--    1    0   2   0 C---- vn /usr/home/paulf/scratch/vg_examples/slp32

host:

15749         0x37fff000         0x38000000 r--    1  436   3   0 CN--- vn /usr/home/paulf/scratch/valgrind/memcheck/memcheck-x86-freebsd
15749         0x38000000         0x38151000 r-x  321  436   3   0 CN--- vn /usr/home/paulf/scratch/valgrind/memcheck/memcheck-x86-freebsd
15749         0x38151000         0x381b7000 r--  102  436   3   0 CN--- vn /usr/home/paulf/scratch/valgrind/memcheck/memcheck-x86-freebsd

Is clang-11 adding some Elf flag that's forcing memcheck to load something at address 0x400000?

nbriggs commented 4 years ago

Thanks. I couldn't remember procstat, for some reason I had "pmap" in my head. I'll look and see if there are any odd elf flags (in the morning). In the meantime, here's what procstat looks like for it:

$ ./vg-in-place sleep 100000
--34114:0:     ume mmap_file_fixed_client #1
--34114:0: aspacem <<< SHOW_SEGMENTS: after #1 (14 segments)
--34114:0: aspacem 1 segment names in 1 slots
--34114:0: aspacem freelist is empty
--34114:0: aspacem (0,4,7) /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
--34114:0: aspacem   0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--34114:0: aspacem   1: FILE 0000400000-0000400fff    4096 r---- d=0x06b i=3781759 o=0       (0,4)
--34114:0: aspacem   2: RSVN 0000401000-0003ffffff     59m ----- SmFixed
--34114:0: aspacem   3: ANON 0004000000-00043fffff 4194304 rwx--
--34114:0: aspacem   4:      0004400000-0037ffffff    828m
--34114:0: aspacem   5: FILE 0038000000-003814ffff 1376256 r-x-- d=0x06b i=3781759 o=4096    (0,4)
--34114:0: aspacem   6: FILE 0038150000-003820efff  782336 r---- d=0x06b i=3781759 o=1376256 (0,4)
--34114:0: aspacem   7: FILE 003820f000-003820ffff    4096 rw--- d=0x06b i=3781759 o=2154496 (0,4)
--34114:0: aspacem   8: ANON 0038210000-003974cfff     21m rw---
--34114:0: aspacem   9:      003974d000-00fbbfefff   3108m
--34114:0: aspacem  10: ANON 00fbbff000-00ffbdefff     63m -----
--34114:0: aspacem  11: ANON 00ffbdf000-00ffbfefff  131072 rwx--
--34114:0: aspacem  12: ANON 00ffbff000-00ffbfffff    4096 r-x--
--34114:0: aspacem  13: RSVN 00ffc00000-00ffffffff 4194304 ----- SmFixed
--34114:0: aspacem >>>
valgrind: mmap(0x400000, 4096) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

VS

$ procstat -v 34114
  PID      START        END PRT  RES PRES REF SHD FLAG  TP PATH
34114   0x400000   0x401000 r--    1 1696   4   0 CN--- vn /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
34114  0x4000000  0x4400000 rwx    5    5   1   0 ----- df 
34114 0x38000000 0x38150000 r-x  336 1696   4   0 CN--- vn /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
34114 0x38150000 0x3820f000 r--  191 1696   4   0 CN--- vn /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
34114 0x3820f000 0x38210000 rw-    1 1696   4   0 CN--- vn /usr/home/briggs/freebsd_valgrind_clang11/memcheck/memcheck-x86-freebsd
34114 0x38210000 0x3974d000 rw-   21   21   1   0 ----- df 
34114 0xfbbff000 0xffbdf000 ---    0    0   0   0 ----- -- 
34114 0xffbdf000 0xffbff000 rwx    1    1   1   0 ---D- df 
34114 0xffbff000 0xffc00000 r-x    1    1  41   0 ----- ph 

So indeed something has caused something to map the memcheck code down at 0x400000.

In order to get that I ended up throwing in a VG_(pread)(0, &x, 1, 0) in front of the error exit() to keep it alive until I'd done the procstat. I haven't mastered adding access to things like sleep() to valgrind.

nbriggs commented 4 years ago

Clang 11 puts the PHDR and 1st LOAD section down at 0x00400000 where Clang 9 puts them up at 0x37fff000 -- no idea why at this time.

(Clang 9 memcheck-x86-freebsd)
Elf file type is EXEC (Executable file)
Entry point 0x380d8140
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x37fff034 0x37fff034 0x000c0 0x000c0 R   0x4
  LOAD           0x000000 0x37fff000 0x37fff000 0x000f4 0x000f4 R   0x1000
  LOAD           0x001000 0x38000000 0x38000000 0x15021b 0x15021b R E 0x1000
  LOAD           0x152000 0x38151000 0x38151000 0x65f80 0x65f80 R   0x1000
  LOAD           0x1b8000 0x381b7000 0x381b7000 0x0072c 0x153cb7c RW  0x1000
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0

 Section to Segment mapping:
  Segment Sections...
   00     
   01     
   02     .text 
   03     .rodata 
   04     .data .bss 
   05     
(Clang 11 memcheck-x86-freebsd)
Elf file type is EXEC (Executable file)
Entry point 0x380d7520
There are 7 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00400034 0x00400034 0x000e0 0x000e0 R   0x4
  LOAD           0x000000 0x00400000 0x00400000 0x00114 0x00114 R   0x1000
  LOAD           0x001000 0x38000000 0x38000000 0x14f2eb 0x14f2eb R E 0x1000
  LOAD           0x1502f0 0x381502f0 0x381502f0 0xbece4 0xbece4 R   0x1000
  LOAD           0x20efd8 0x3820ffd8 0x3820ffd8 0x0072c 0x153cbd8 RW  0x1000
  GNU_EH_FRAME   0x1b61c4 0x381b61c4 0x381b61c4 0x054d4 0x054d4 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0

 Section to Segment mapping:
  Segment Sections...
   00     
   01     
   02     .text 
   03     .rodata .eh_frame_hdr .eh_frame 
   04     .data .bss 
   05     .eh_frame_hdr 
   06     
nbriggs commented 4 years ago

I've reported Bug 46110 against LLVM/lld for having left a chunk of read-only data at the default load address when the text segment is moved to an alternate address. We'll see if they agree it's a bug, or tell us what we're missing in terms of linker options to make it get out of the way.

nbriggs commented 4 years ago

Looks as though the real answer to this is don't use -Ttext='altaddr' because the meaning of this changed between LLVM/lld 9 and LLVM/lld 10. The way to get the position we want is now --image-base='altaddr', so where coregrind/link_tool_exe_freebsd.in currently does

my $cmd="$cc -static -Wl,-Ttext=$ala";

we need to end up with

my $cmd="$cc -static -Wl,--image-base=$ala";

and there's a bunch of mechanism in configure.ac that is figuring out whether the linker accepts -Ttext or -Ttext-segment that needs to be extended to check for supporting --image-base (in preference to either -T option, I guess)

nbriggs commented 4 years ago

See "Breaking changes" section in https://releases.llvm.org/10.0.0/tools/lld/docs/ReleaseNotes.html for the description.

paulfloyd commented 4 years ago

Sounds good. Would it be possible for you to make a patch?

nbriggs commented 4 years ago

I can try... the right fix (rather than my editing hack on the Makefile) needs a change in configure.ac, which is not my area of expertise, however I looked at it this morning and I think I see how to do it there.

On May 31, 2020, at 12:54 PM, Paul Floyd notifications@github.com wrote:

Sounds good. Would it be possible for you to make a patch?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paulfloyd/freebsd_valgrind/issues/95#issuecomment-636520375, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6DAWKNNV7QAQQGFGY2PMLRUKYYVANCNFSM4NLIOYGQ.

nbriggs commented 4 years ago

Further investigation reveals that not only does configure.ac need to be changed, but coregrind/link_tool_exe_freebsd.in needs to be updated to be more like (if not identical to) coregrind/link_tool_exe_linux.in, incorporating the result of configure's determination of which linker flag to use. The current FreeBSD version appears to have been copied from an earlier Linux version predating the linker flag issues. Patch coming soon.

nbriggs commented 4 years ago

Can you review this patch -- code-wise I think it's OK, but there's some commentary that I'm not so sure about. I've found that even the version 8.0 ld.lld that ships with FreeBSD 12.1 supports --image-base, with the result that my changes will change how valgrind gets built on current systems. I don't have a 64-bit FreeBSD to test on, so I'd appreciate your regtest there (with stock ld.lld) as well. For building valgrind and the regtest code with clang11 I'm getting about 50 more failures (91 vs 43 stderr), most seem to be where valgrind decides that a global variable reference is actually a reference to a read-write segment in the mapped test executable, which I have not investigated. new-lld-patch.txt

paulfloyd commented 4 years ago

Thanks for the patch! I'm getting (with clang-devel)

== 708 tests, 75 stderr failures, 6 stdout failures, 3 stderrB failures, 2 stdoutB failures, 0 post failures ==

compared to (with the default clang)

== 721 tests, 31 stderr failures, 5 stdout failures, 0 stderrB failures, 2 stdoutB failures, 0 post failures ==

nbriggs commented 4 years ago

Thanks for checking it out. I'm a little surprised that it's running fewer tests. I reran clang 8 (default), clang 9, and clang-devel (11) this morning after your last commit (cbec92609908c35238af49713a1e864f897d4087) and got:

regtest-clang11-image-base-2.out: == 644 tests, 88 stderr failures, 3 stdout failures, 5 stderrB failures, 3 stdoutB failures, 1 post failure ==

regtest-clang9-image-base-2.out: == 644 tests, 38 stderr failures, 2 stdout failures, 3 stderrB failures, 3 stdoutB failures, 1 post failure ==

regtest-clang8-image-base-2.out: == 644 tests, 39 stderr failures, 2 stdout failures, 3 stderrB failures, 3 stdoutB failures, 1 post failure ==

paulfloyd commented 4 years ago

I'll look at why fewer tests run tonight.

paulfloyd commented 4 years ago

Patch pushed commit 9748df5f6a0cefd5b4a4beeeffe1d3668611e561 (HEAD -> freebsd, origin/freebsd, origin/HEAD)

Will open new issues for clang10/11 related problems.