Open creikey opened 2 years ago
I get the same result. The issue seems to be that the Rip
address is different from those that appear in the objdump file. In my case I get 140737458009076
for Rip but my objdump addresses after hex conversion are between 5368713216
and 5370365077
.
Maybe there's some kind of prefixing going on in the Rip
address? I did some experiments but no luck.
What compiler are you running to compile? The objdump.exe needs to come from that compiler.
It would be nice if you could provide a simple program, maybe one of the test programs, its compile command, its object dump command and output.
I also don't have windows 11 to test on. I should get that.
I've run it on a slightly modified version of test2 so it terminates:
import os
var
i = 0
x = 0
while true:
echo "here ", i, " ", os.getCurrentProcessId(), " //", x
inc i
for j in 0 .. 100_000_000:
inc x
if i > 10:
break
I was using the nimble gcc originally instead of the one paired with my objdump, but after switching the compiler I get similar hottie output but the addresses are closer.
I've compiled using:
set CC=C:\MinGW64\bin\gcc.exe
nim c --cc:env -r --verbosity:3 --debugger:native -d:release --gc:arc .\examples\test2.nim
The addresses are in the right range now, but I'm still not getting results for the --lines and --procedures modes. I get something from --addresses though, using:
hottie -a .\examples\test2.exe
Results:
Program ended
samples time percent what
42 238.605ms 34.426% C:\Src\hottie\dump.txt:9238
28 159.070ms 22.951% C:\Src\hottie\dump.txt:9250
24 136.346ms 19.672% C:\Src\hottie\dump.txt:9249
23 130.664ms 18.852% C:\Src\hottie\dump.txt:9237
4 22.724ms 3.279%
1 5.681ms 0.820% C:\Src\hottie\dump.txt:9240
Samples per second: 176.0 totalTime: 0.693ms
And here's the objdump file: https://www.dropbox.com/s/3umdt4mro947ydz/dump.txt?dl=0
Compiler Version:
λ C:\MinGW64\bin\gcc.exe -v
Using built-in specs.
COLLECT_GCC=C:\MinGW64\bin\gcc.exe
COLLECT_LTO_WRAPPER=C:/MinGW64/bin/../libexec/gcc/x86_64-w64-mingw32/8.1.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../../../src/gcc-8.1.0/configure --host=x86_64-w64-mingw32 --build=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --prefix=/mingw64 --with-sysroot=/c/mingw810/x86_64-810-win32-seh-rt_v6-rev0/mingw64 --enable-shared --enable-static --disable-multilib --enable-languages=c,c++,fortran,lto --enable-libstdcxx-time=yes --enable-threads=win32 --enable-libgomp --enable-libatomic --enable-lto --enable-graphite --enable-checking=release --enable-fully-dynamic-string --enable-version-specific-runtime-libs --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-bootstrap --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-gnu-as --with-gnu-ld --with-arch=nocona --with-tune=core2 --with-libiconv --with-system-zlib --with-gmp=/c/mingw810/prerequisites/x86_64-w64-mingw32-static --with-mpfr=/c/mingw810/prerequisites/x86_64-w64-mingw32-static --with-mpc=/c/mingw810/prerequisites/x86_64-w64-mingw32-static --with-isl=/c/mingw810/prerequisites/x86_64-w64-mingw32-static --with-pkgversion='x86_64-win32-seh-rev0, Built by MinGW-W64 project' --with-bugurl=https://sourceforge.net/projects/mingw-w64 CFLAGS='-O2 -pipe -fno-ident -I/c/mingw810/x86_64-810-win32-seh-rt_v6-rev0/mingw64/opt/include -I/c/mingw810/prerequisites/x86_64-zlib-static/include -I/c/mingw810/prerequisites/x86_64-w64-mingw32-static/include' CXXFLAGS='-O2 -pipe -fno-ident -I/c/mingw810/x86_64-810-win32-seh-rt_v6-rev0/mingw64/opt/include -I/c/mingw810/prerequisites/x86_64-zlib-static/include -I/c/mingw810/prerequisites/x86_64-w64-mingw32-static/include' CPPFLAGS=' -I/c/mingw810/x86_64-810-win32-seh-rt_v6-rev0/mingw64/opt/include -I/c/mingw810/prerequisites/x86_64-zlib-static/include -I/c/mingw810/prerequisites/x86_64-w64-mingw32-static/include' LDFLAGS='-pipe -fno-ident -L/c/mingw810/x86_64-810-win32-seh-rt_v6-rev0/mingw64/opt/lib -L/c/mingw810/prerequisites/x86_64-zlib-static/lib -L/c/mingw810/prerequisites/x86_64-w64-mingw32-static/lib '
Thread model: win32
gcc version 8.1.0 (x86_64-win32-seh-rev0, Built by MinGW-W64 project)
Nice the addresses are correct. The objdump parser I have probably does not support the new objdump format. I have seen this happen on mac. Man I wish objdump had like a json format that never changes. I think some tweaks to the parser are required.
This might be an issue with newer versions of GCC. I'm getting that address mismatch between GetThreadContext
and the objdump
for the v11.1 gcc and objdump from .choosenim\toolchains\mingw64\bin
. I'll explore some more tomorrow.
So dumping cpuHotAddresses
after compiling test2 with GCC 11.1 I get:
λ hottie .\examples\test2.exe
Running objdump...
Program ended
samples time percent what
7685 8049.077ms 100.000%
count : address [HEX] :: BINARY
1155 : 140695318335471 [00007FF62E788FEF] :: 0000000000000000011111111111011000101110011110001000111111101111
623 : 140695318335539 [00007FF62E789033] :: 0000000000000000011111111111011000101110011110001001000000110011
1016 : 140695318335474 [00007FF62E788FF2] :: 0000000000000000011111111111011000101110011110001000111111110010
1331 : 140695318335520 [00007FF62E789020] :: 0000000000000000011111111111011000101110011110001001000000100000
1245 : 140695318335477 [00007FF62E788FF5] :: 0000000000000000011111111111011000101110011110001000111111110101
449 : 140695318335467 [00007FF62E788FEB] :: 0000000000000000011111111111011000101110011110001000111111101011
1 : 140721003754228 [00007FFC2970F2F4] :: 0000000000000000011111111111110000101001011100001111001011110100
391 : 140695318335530 [00007FF62E78902A] :: 0000000000000000011111111111011000101110011110001001000000101010
1 : 140721003755380 [00007FFC2970F774] :: 0000000000000000011111111111110000101001011100001111011101110100
394 : 140695318335524 [00007FF62E789024] :: 0000000000000000011111111111011000101110011110001001000000100100
1079 : 140695318335542 [00007FF62E789036] :: 0000000000000000011111111111011000101110011110001001000000110110
Samples per second: 954.8 totalTime: 8.049ms
And after compiling with GCC 8.1 I get:
λ hottie .\examples\test2.exe
Running objdump...
Program ended
samples time percent what
3555 3681.458ms 69.083% C:\Src\hottie\examples\test2.nim:10
1588 1644.488ms 30.859% C:\Users\chanc\.choosenim\toolchains\nim-1.6.6\lib\system\iterators_1.nim:91
3 3.107ms 0.058%
622 : 4230807 [0000000000408E97] :: 0000000000000000000000000000000000000000010000001000111010010111
1012 : 4230800 [0000000000408E90] :: 0000000000000000000000000000000000000000010000001000111010010000
630 : 4230813 [0000000000408E9D] :: 0000000000000000000000000000000000000000010000001000111010011101
898 : 4230822 [0000000000408EA6] :: 0000000000000000000000000000000000000000010000001000111010100110
690 : 4230809 [0000000000408E99] :: 0000000000000000000000000000000000000000010000001000111010011001
2 : 140721003754228 [00007FFC2970F2F4] :: 0000000000000000011111111111110000101001011100001111001011110100
642 : 4230816 [0000000000408EA0] :: 0000000000000000000000000000000000000000010000001000111010100000
1 : 140721003755380 [00007FFC2970F774] :: 0000000000000000011111111111110000101001011100001111011101110100
649 : 4230803 [0000000000408E93] :: 0000000000000000000000000000000000000000010000001000111010010011
Samples per second: 965.6 totalTime: 5.329ms
And here's my dump code:
for address in cpuHotAddresses.keys():
let count = cpuHotAddresses[address]
if count > 0:
echo count, " : ", address, " [", address.int.toHex(), "] :: ", toBin(address.int, 64)
Disclaimer: This is not my expertise, so consider me a blind man lost in a maze and interpret my feedback accordingly. : )
This may be relevant: https://stackoverflow.com/questions/61061830/i-am-wondering-if-pie-does-anything-if-the-aslr-is-turned-off-on-the-system-or
There's also an objdump option called --adjust-vma=
I'm definitely getting some kind of address randomization on builds from GCC 11.1. It's stable between runs of the executable though.
This might be a means of getting the runtime offset from the Windows API you're already using:
#include <windows.h>
#include <TlHelp32.h>
uintptr_t GetModuleBaseAddress(DWORD procId, wchar_t* modName)
{
//initialize to zero for error checking
uintptr_t modBaseAddr = 0;
//get a handle to a snapshot of all modules
HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE | TH32CS_SNAPMODULE32, procId);
//check if it's valid
if (hSnap != INVALID_HANDLE_VALUE)
{
//this struct holds the actual module information
MODULEENTRY32 modEntry;
//this is required for the function to work
modEntry.dwSize = sizeof(modEntry);
//If a module exists, get it's entry
if (Module32First(hSnap, &modEntry))
{
//loop through the modules
do
{
//compare the module name against ours
if (!_wcsicmp(modEntry.szModule, modName))
{
//copy the base address and break out of the loop
modBaseAddr = (uintptr_t)modEntry.modBaseAddr;
break;
}
//each iteration we grab the next module entry
} while (Module32Next(hSnap, &modEntry));
}
}
//free the handle
CloseHandle(hSnap);
return modBaseAddr;
}
Found on this random game exploit website: https://guidedhacking.com/threads/get-module-base-address-tutorial-dwgetmodulebaseaddress.5781/
Okay... apologies for all the comments but I think I figured out how to get back to the objdump addresses.
I'm going to be a little verbose for posterity's sake. There's a value in the Windows binary PE format (Portable Executable) called ImageBase. An easy way to dump these values is to pip install the Python pereader
module and then run:
python -c "import pereader;pe = pereader.PE(r'.\examples\test2.exe');print(pe.dump())"
ImageBase is towards the top under the # Optional Header
section. This value will correspond with the beginning of the output from objdump -dl .\examples\test2.exe
, in my case (0x140000000):
.\examples\test2.exe: file format pei-x86-64
Disassembly of section .text:
0000000140001000 <__mingw_invalidParameterHandler>:
objdump: Dwarf Error: Could not find abbrev number 1764.
140001000: 55 push %rbp
140001001: 48 89 e5 mov %rsp,%rbp
The second piece comes from using the Toolhelp32 API. We'll iterate through the modules looking for the name that matches our exe name with (C++ example in the guidedhacking.com link above):
proc getBaseAddress*(pid: int, modName: string): uint =
var h = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE or TH32CS_SNAPMODULE32, DWORD(pid))
if h != INVALID_HANDLE_VALUE:
var modEntry: MODULEENTRY32
modEntry.dwSize = DWORD(sizeof(modEntry))
var valid = Module32First(h, modEntry.addr)
while valid == 1:
let modEntryName = filter(modEntry.szModule, x => x > 0).map(x => chr(x)).join("")
if modEntryName == modName:
result = cast[uint](modEntry.modBaseAddr)
break
valid = Module32Next(h, modEntry.addr)
CloseHandle(h)
Finally, we combine the two offsets to get back to our objdump addresses:
let exe_base_address = getBaseAddress(pid, "test2.exe") # address from Toolhelp32
let image_base_address = 0x140000000 # from PE header, may need to grab dynamically
let relative = (address.int64 - exe_base_address.int64) + image_base_address # address is the instruction address from GetThreadContext
I think the last piece of this for larger programs is that the entire Relocations Directory needs to be parsed from the PE header to map the runtime addresses. Large programs get chunked and moved around in memory based on the relocations.
In my case there's 31 of these (2 partial entries from peheader below):
# Base Relocations Directory -----------------------------------------------------
[IMAGE_BASE_RELOCATION]
0x1dcc00 0x0 VirtualAddress 0x195000
0x1dcc04 0x4 SizeOfBlock 0xc
0xa768 IMAGE_REL_BASED_DIR64
0x0 IMAGE_REL_BASED_ABSOLUTE
[IMAGE_BASE_RELOCATION]
0x1dcc0c 0x0 VirtualAddress 0x196000
0x1dcc10 0x4 SizeOfBlock 0x20
0xa250 IMAGE_REL_BASED_DIR64
0xa300 IMAGE_REL_BASED_DIR64
0xa320 IMAGE_REL_BASED_DIR64
The directory might also accessible from this API: https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/
So parsing the base relocations doesn't seem necessary. The issue I was running into is that DLL modules within the process show up at high address values, so those just need to be ignored or I was counting them separately as "external". Otherwise, the Image Base of 0x140000000 usually shouldn't change but it can be parsed from the PE file as so:
var pe_file = open(filepath, fmRead)
let sig_offset = pe_file.read(60, uint32).int # Signature offset field is at fixed position of 0x3C bytes.
let coff_header = sig_offset + 4 # COFF Header immediately follows the 4 byte signature in Image files.
let optional_header_pos = coff_header + 20 # Last COFF field "Characteristics" is at 18 bytes with a 2 byte width.
let optional_header_size_pos = coff_header + 16 # Offset 16 bytes within the COFF header.
let signature = pe_file.read(sig_offset, array[4, char])
check_signature: ['P', 'E', '\0', '\0']
let optional_header_size = pe_file.read(optional_header_size_pos, uint16)
check_optional_header_size: optional_header_size == 0
let optional_header_magic = pe_file.read(optional_header_pos, uint16)
check_optional_header_magic: optional_header_magic in [MAGIC_PE32, MAGIC_PE32_PLUS, MAGIC_ROM]
var
image_base_pos: int
image_base: uint64
if optional_header_magic == MAGIC_PE32:
image_base_pos = optional_header_pos + 28 # ImageBase : PE32 has an extra 4-byte field (BaseOfData) over PE32+
elif optional_header_magic == MAGIC_PE32_PLUS:
image_base_pos = optional_header_pos + 24 # ImageBase : PE32 has an extra 4-byte field (BaseOfData) over PE32+
image_base = cast[uint64](pe_file.read(image_base_pos, uint32))
EDIT: There still seems to be something weird going on here with the addresses though.
Here is the output of running hottie:
On my project here: https://github.com/creikey/recurrent-evolution/commit/b3303cce553c411081f056353b4897f5d457cdca
Compiled like this:
nim -d:release --debugger:native --stacktrace:on c src\recurrent_evolution.nim && hottie src\recurrent_evolution.exe
I am on windows 11