serge1 / ELFIO

ELFIO - ELF (Executable and Linkable Format) reader and producer implemented as a header only C++ library
http://serge1.github.io/ELFIO
MIT License
706 stars 152 forks source link

Segments change, when saving a loaded file #135

Open MofX opened 4 months ago

MofX commented 4 months ago

When saving a file loaded with ELFIO, without any modifications, the segment definition can change.

I added a test in https://github.com/MofX/ELFIO/commit/0ac9c678e2e2f8c47a652ee04e31854bd500da06, that shows the behavior and could be added as a regression test, after fixing the bug.

It includes a binary generated with echo "int main(){}" | gcc -xc -static -o x86_64_static -, that is loaded with elfio and saved again. In the initial version the segments are compared to the output of readelf -l and for the saved one the same is done, but fails, because elfio changes four fields of the segment Definition:

At least the change in segment 6, which is the the TLS segment breaks the binary, because it changes the memory layout of the TLS, so that the libc initialization code uses an invalid pointer and segfaults.

I think the changed sizes are due to how elfio tries to map sections to segments. Instead of just mapping .tdata and .tbss to the TLS segment, it also maps .init_array and .finit_array to the TLS segment. The combined file size of these sections is 0x80. readelf -l shows a correct mapping. I imagine, the same is happening for segment 9. I did not look into why the alignment of segment 9 is changed.

serge1 commented 4 months ago

Thank you for your analysis. You are right - ELFIO maps sections to segments and calculates alignment of section in accordance to contained sections.

I'll try to reproduce and find the reason for the case you provided.

zyedidia commented 3 months ago

I am also experiencing this issue, since I am trying to load existing static ELF files, modify them, and write them back. Any help on this would be appreciated.

serge1 commented 2 months ago

Most likely, the issue is related to section size equal to zero and inability to assign a proper segment for such section. I'm sorry, but, I'll not come to this issue soon. Any help will be appreciated.

serge1 commented 1 month ago

Well, as you can see, even readelf -l could not assign sections to segment properly. .tdata appears in three segments simultaneously. I have added a workaround and assign sections having SHF_TLS flag to segments of type PT_TLS only. I am not sure how much this workaround is reliable.

Another issue with the executable is that the last segment of type PT_GNU_RELRO contains sections with alignments up to 32 (section .data.rel.ro). At the same time, the original alignment of this segment is only 1.

serge1 commented 1 month ago

3 segments of the initial file are located at the same memory location and even worse - at the same file offset. I doubt that such configuration can be supported for modification easily. I'll leave the issue opened for meantime. May be some ideas will come. But, I am very sceptic about this.

Another question arises regarding the size of ninth segment GNU_RELRO. If it contains .init_array, .fini_array, .data.rel.ro, and .got sections, than its size should be equal to 0x10+0x10+0x2df4+f0 = 0x2f04. Even if alignment of section .data.rel.ro is taken into account, the size does not reach the segment's size equal to 0x2f40. Until these questions are answered, I'll comment out the test assertions.

MofX commented 1 month ago

I think this is a design problem in elfio. In the elf specification sections and segments are almost completely unrelated. Segments are used, when loading the binary into memory for execution. Here the type of the segment determines how it is used (e.g. PT_LOAD is mapped into the virtual memory space of the process). I don't really know what the other sections are used for, but they may be used by the linker or kernel during loading. So it is no coincidence, that there is no reference from segments to sections (e.g. id or name) in the elf. You simple cannot deduce the sections from segments, but that is what elfio is trying to do. Section to segment mapping is also in readelf just a best effort approach.

I only see two segments at the same address (NOTE and GNU_PROPERTY @ 0x400270). Both segments map the same physical data from the file (0x270 - 0x290). From an elf perspective this is totally fine, because the PT_NOE and PT_GNU_PROPERTY may be used to describe the same thing (as I said, I don't really know).

The main thing here is: This is not an artificial binary, this is generated by gcc/binutils without any special flags, except -static and right now elfio is completely unusable to modify these binaries. The solution to fix this would probably be API breaking, i.e. drop the link between sections and segments in the model and keep segments independent. (A function to generated a segment from a section could still work though. But I guess this would create issues for modifying a binary, where the physical offset of the data changes. This is mainly the reason why I did not try to fix anything and decided to just send in the bug.

Maybe one option would at least be, to just deny saving the file, when it is not possible to save it exactly the way it was read, because right now this produces bugs, that are extremely hard to understand. If I recall correctly it took me a few days to until I understood what was happening, because the resulting program crashed for no obvious reason and this was related to TLS init in libc, which was hard to debug and understand why a pointer was just wrong