serge1 / ELFIO

ELFIO - ELF (Executable and Linkable Format) reader and producer implemented as a header only C++ library
http://serge1.github.io/ELFIO
MIT License
720 stars 155 forks source link

Add support for memory-mapped ELFs #76

Closed galjs closed 2 years ago

galjs commented 2 years ago

Currently ELFIO succeeds in parsing the header of a memory-mapped ELF dump file, but fails to parse other elements of the ELF file that are present also in memory such as segment headers and .dynsym symbols.

For some of these structures no changes in parsing code need to be made since fields in the header point to them even in memory (segment headers, for example). For others (symbols, for example), different parsing logic is needed so I suggest adding a flag to elfio's load() function that specifies if the elf passed to it was dumped from memory or not.

serge1 commented 2 years ago

I am not sure what you call “memory-mapped ELF dump file”.

An ELF file loaded to memory has limited amount of information available. Even file sections names are not available due to the lack of string tables. Basically, in common case which includes bare metal images, only segment data payload is accessible. You still may use ELFIO library for parsing original file and dump memory by following virtual memory locations.

galjs commented 2 years ago

@serge1 What I mean by “memory-mapped ELF dump file” is reading from the /proc/PID/mem of a process the segments that contain the loaded ELF file and writing them into a new file - a dump of the loaded ELF file.

As for the lack of information - you are right. There's less information to recover in memory-mapped ELFs. But there's still some interesting data to get such as the header, the segment headers and their content, the .dynsym section etc... (try using readelf on a dump file. You'll see all section data is screwed-up but you'll also see that some data is still present).

In elfio most of the parsing logic combines parsing data that's present only in the original file (such as section headers) with parsing of data that can be found in both regular files and dump files (such as segment headers). This entanglement prevents the parsing of all available data from dump files.

serge1 commented 2 years ago

The idea of dump file processing sounds interesting. Would you please advise which tool can I use to produce an example of such file?

Edit: I have managed to produce the dump by using 'gcore' utility

galjs commented 2 years ago

Here's a short python script I found here. It dumps all the used parts of a process memory. This is a little extensive since all we need is the mapping of the process exe file (the first mapped file shown in `/proc/PID/maps), so it needs a little editing in order to extract only the relevant parts.

serge1 commented 2 years ago

My attempt to implement this request is located at branch 'translate_offset'. Specifically the commit b527ea9. Please take a look at the new example file "examples/proc_mem/proc_mem.cpp".

It is able to take the translation table located in /proc/pid/maps and use it to access ELF file components located in memory. For example, for /usr/bin/base file I got the following:

// Translation table from /proc/2919/m
561f91d70000-561f91d9f000 r--p 00000000 08:03 2883678                    /usr/bin/bash
561f91d9f000-561f91e7f000 r-xp 0002f000 08:03 2883678                    /usr/bin/bash
561f91e7f000-561f91eba000 r--p 0010f000 08:03 2883678                    /usr/bin/bash
561f91ebb000-561f91ebf000 r--p 0014a000 08:03 2883678                    /usr/bin/bash
561f91ebf000-561f91ec8000 rw-p 0014e000 08:03 2883678                    /usr/bin/bash
561f91ec8000-561f91ed3000 rw-p 00000000 00:00 0 
561f91f6b000-561f9212c000 rw-p 00000000 00:00 0                          [heap]

Reading afterwards from /proc/pid/mem gives:

// start of program in memory
user@user-virtual-machine:~/ELFIO$ sudo xxd -i -c 16 -l 128 -seek 94693590761472 /proc/2919/mem
unsigned char _proc_2919_mem[] = {
  0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x03, 0x00, 0x3e, 0x00, 0x01, 0x00, 0x00, 0x00, 0x40, 0x2f, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc8, 0x67, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x0d, 0x00, 0x40, 0x00, 0x1e, 0x00, 0x1d, 0x00,
  0x06, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0xd8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00
};
unsigned int _proc_2919_mem_len = 128;

vs. the content located in the original ELF file:

// start of program in ELF file
user@user-virtual-machine:~/ELFIO$ xxd -i -c 16 -l 128 -seek 0 /usr/bin/bash
unsigned char _usr_bin_bash[] = {
  0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x03, 0x00, 0x3e, 0x00, 0x01, 0x00, 0x00, 0x00, 0x40, 0x2f, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc8, 0x67, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x0d, 0x00, 0x40, 0x00, 0x1e, 0x00, 0x1d, 0x00,
  0x06, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0xd8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00
};
unsigned int _usr_bin_bash_len = 128;

Segment data is also looks in sync. For example:

// code in memory
user@user-virtual-machine:~/ELFIO$ sudo xxd -i -c 16 -l 128 -seek 94693590953984 /proc/2919/mem
unsigned char _proc_2919_mem[] = {
  0xf3, 0x0f, 0x1e, 0xfa, 0x48, 0x83, 0xec, 0x08, 0x48, 0x8b, 0x05, 0xd1, 0xfe, 0x11, 0x00, 0x48,
  0x85, 0xc0, 0x74, 0x02, 0xff, 0xd0, 0x48, 0x83, 0xc4, 0x08, 0xc3, 0x00, 0x00, 0x00, 0x00, 0x00,
  0xff, 0x35, 0xb2, 0xf6, 0x11, 0x00, 0xf2, 0xff, 0x25, 0xb3, 0xf6, 0x11, 0x00, 0x0f, 0x1f, 0x00,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x00, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xe1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x01, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xd1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x02, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xc1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x03, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xb1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x04, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xa1, 0xff, 0xff, 0xff, 0x90
};
unsigned int _proc_2919_mem_len = 128;

vs. the same segment data in the file:

// code in ELF file
user@user-virtual-machine:~/ELFIO$ xxd -i -c 16 -l 128 -seek 192512 /usr/bin/bash
unsigned char _usr_bin_bash[] = {
  0xf3, 0x0f, 0x1e, 0xfa, 0x48, 0x83, 0xec, 0x08, 0x48, 0x8b, 0x05, 0xd1, 0xfe, 0x11, 0x00, 0x48,
  0x85, 0xc0, 0x74, 0x02, 0xff, 0xd0, 0x48, 0x83, 0xc4, 0x08, 0xc3, 0x00, 0x00, 0x00, 0x00, 0x00,
  0xff, 0x35, 0xb2, 0xf6, 0x11, 0x00, 0xf2, 0xff, 0x25, 0xb3, 0xf6, 0x11, 0x00, 0x0f, 0x1f, 0x00,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x00, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xe1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x01, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xd1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x02, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xc1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x03, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xb1, 0xff, 0xff, 0xff, 0x90,
  0xf3, 0x0f, 0x1e, 0xfa, 0x68, 0x04, 0x00, 0x00, 0x00, 0xf2, 0xe9, 0xa1, 0xff, 0xff, 0xff, 0x90
};

So, nothing interesting so far.

I am stuck with the section header content. While the file contains:

// section headers in ELF file
user@user-virtual-machine:~/ELFIO$ xxd -i -c 16 -l 256 -seek 1402824 /usr/bin/bash
unsigned char _usr_bin_bash[] = {
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x0b, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x18, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x18, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x1c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x13, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x38, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x38, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x26, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x68, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x68, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};
unsigned int _usr_bin_bash_len = 256;

corresponding information is not available in memory:

// section headers in memory
user@user-virtual-machine:~/ELFIO$ sudo xxd -i -c 16 -l 256 -seek 94693592168392 /proc/2919/mem
unsigned char _proc_2919_mem[] = {
  0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x60, 0x03, 0x0e, 0x92, 0x1f, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};
unsigned int _proc_2919_mem_len = 256;

So, without section header information, no interesting data can be retrieved from memory. You were interested in content of ".dynsym" section. But, the content cannot be retrieved by regular ELF parsing from memory. Sure, it is available in the original file itself.

Please let me know if I am missing something essential, but, I didn't find any info that cannot be taken from the original ELF file.

I stuck with the section header and didn't continue implementation of the similar translation mechanism for segments

serge1 commented 2 years ago

I am moving the issue to "Discussions"