Emitting hotpatch-friendly executable

rui314 commented 1 year ago

This is yet another crazy linker feature idea. We may be able to do something in the linker to support hot code patching, i.e. updating code of a running process.

Hotpatching is useful for programs that take a long time to start up. We don't want to restart the entire process on every rebuild.

Hotpatching is generally a hard problem and sometimes impossible to solve. For example, if a global data structure has changed in a binary-incompatible manner, we don't generally know how to translate the old data structure to the new one, which makes code hotpatching impossible.

But we don't need a completely generic solution. Maybe, it is still very convenient if it is possible to change an existing function a little bit or just add printf statement in a middle of an existing function. If an application full restart is needed, we can just restart it, so we don't need a perfect solution. (Updating a production service without restart is out of scope of this discussion.)

So, how can we replace an existing function? If all function calls are made through PLT, it is easy to replace a function because we only need to update the PLT entry (or, more precisely, the .got or .got.plt entry for the PLT entry). So, we may want to create PLT entries even for local function calls for a hotpatch-friendly output.

How do we load new function to the running process? Maybe we can just use dlopen(). With hotpatching enabled, the linker would create an .so instead of an executable, and that .so contains only the differences from the previous build. Then, somewhere in the main loop of the application, the application checks whether a new .so is available or not, and if it's available, it dlopen()s it and calls its onload function. The onload function updates .got.plt/.got entries of the main application so that they refer to the new functions instead of the old ones.

I believe this mechanism isn't too hard to implement.

marxin commented 1 year ago

How do we load new function to the running process?

You're touching here on a topic which https://github.com/SUSE/libpulp can handle.

@giulianobelinassi might be interested in this issue.

giulianobelinassi commented 1 year ago

The way we do on libpulp is to:

1- Compile the original code with gcc's -fpatchable-functions-entry=16,14 to put a prologue into each function. 2- At patch time, we load a .so file with the new functions, then we modify the prologue of the old function to jump into the new function through a jmp instruction.

Hence, we don't actually modify code that is in the middle of a function. We only modify code on the beginning of a function.

I remember experimenting with .plt and .got.plt only to conclude that once the function is called, the plt table is skip and the function is called directly. But please verify this claim because my memory may be tricking me in this one.

rui314 commented 1 year ago

By default, all function calls that are guaranteed to be within the same ELF module skips PLT and directly jump to destination functions. But if we have a control over the linker, we can make such function calls to use PLT so that rewriting a PLT's destination is suffice to redirect a function.

rui314 / mold

Emitting hotpatch-friendly executable #960