hackclub / putting-the-you-in-cpu

A technical explainer by @kognise of how your computer runs programs, from start to finish.
https://cpu.land
MIT License
4.71k stars 145 forks source link

Possibility of discussion of ld.so/dyld/etc. behavior #41

Open ZPedro opened 10 months ago

ZPedro commented 10 months ago

I believe the end of chapter 4 deserves a quick discussion of how control gets from the entry point of the dynamic linker to the entry point of the executable. Including these fun tidbits (for some values of fun, anyway):

ZPedro commented 9 months ago

As an illustration of why the control path from the dynamic linker to the executable matters: the first version of both "parental controls" and code signing in Mac OS X (as it was then known) were in fact off to a good start, as they implemented those as part of execve(). No way you can load an executable other than through execve, right?

But as Thomas Ptacek raised at the time you could still coerce the dynamic linker into loading any dynamic library you liked (through DYLD_INSERT_LIBRARIES: Markdown garbled the environment variable in the original post): as discussed above that does not go through execve, and the kernel is only involved to the extent these files are memory-mapped with the executable bit set, for which there was no gate check.

Well, what possible harm could one insane, mutant dynamic library do? It's just a library, it's not going to be in control until code in the executable (or code in another library invoked by the executable) calls it, right? Except that for initialization purposes all dynamic linkers support code in a specifically designated section of the dynamic library image (.init, in the case of ELF), which the dynamic linker jumps to as part of setting up the dynamic library (this is what Mr. Ptacek refers to when he mentions gcc constructor functions); in other words, before main() gets called. You're not supposed to do anything scary in there, but there is no mechanism to prevent that code from never returning and end up controlling the newly reset process, in effect diverting the control away from main().