ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.7k stars 2.47k forks source link

Share a canonical panic handler across compilation units #20240

Open mlugg opened 3 months ago

mlugg commented 3 months ago

Whilst looking into porting the UBSan runtime to Zig, I recalled a conversation with Andrew about the fact that @panic ought to only result in a single panic handler across all compilation units in a binary. Ideally, it should be possible to both reference the handler from C code, and define it in C code.

The self-hosted compiler does not yet implement this behavior: uses of @panic just emit calls to std.builtin.panic. This issue details a proposal for how to implement this behavior.


The panic handler is provided by a function with the following signature (note that this depends on #17969):

/// The panic cause is unpacked into `cause`, `data0`, and `data1`; depending on `cause`, the data parameters may be ignored.
/// If there is no error return trace, then `error_return_trace.index == std.math.maxInt(usize)`.
/// If there is no return address, then `ret_addr == 0`.
extern fn __zig_panic(cause: @typeInfo(PanicCause).Union.tag_type.?, data0: usize, data1: usize, error_return_trace: CompletedStackTrace, ret_addr: u64);

const CompletedStackTrace = extern struct {
    index: usize,
    instruction_addresses: [*]usize,
};

Calls to @panic are translated into calls to this function, converting the PanicCause, ?StackTrace and ?u64 to the parameter types according to the doc comments.

The more difficult question to answer is which compilation unit provides this handler, and how. I propose the following rules.

This system allows the user to set a preference using -f[no-]panic-handler if desired, but otherwise provides reasonable defaults. Let's analyze the possible failure cases:

A binary having no panic handler is not possible under the default settings (it can only happen if -fno-panic-handler is passed to every compilation). In this case, a link error occurs if any CU references the panic handler, but it is due to explicit user overrides preventing successful compilation, which seems like a non-issue.

A binary having multiple panic handlers, all with weak linkage, can occur by default if the final executable is not built with Zig - for instance, if a library (static or shared) is built with Zig and linked into a C program which is compiled with clang or gcc. In this case, the linker will functionally choose a random implementation (I think it's technically the first one?). This isn't ideal, but isn't easily resolvable, because no compilation unit is more authoritative than any other in this context. In practice, it will be rare for libraries to override the panic handler, so all will probably have the default, meaning the "randomness" here is typically unobservable. If any CU provides a custom panic handler, then the user can set -fpanic-handler on that one in their build script to give it strong linkage and hence make it override the others.

A binary having multiple panic handlers, at least two of which have strong linkage, can never occur by default. It only happens when -fpanic-handler is passed to at least one compilation. The main case I see being confusing here is passing -fpanic-handler to a library, and linking it to a full executable, all with Zig: both will provide panic handlers with strong linkage, at which point we again defer to functionally random selection. This case is unfortunate: the user presumably expected the override to cause the panic handler to deterministically come from the library. For this reason, we augment the executable rule a little:

This extra case removes this unintuitive behavior, whilst still making sure to provide a panic handler in all cases. It makes sure that doing everything with the Zig compiler is a happy case, where everything "just works".


Let us now look at the impact of this design in practice. I'll list a few scenarios, and explain how the logic would play out (assuming no -f[no-]panic-handler overrides).

Scenario 1: building an executable, from Zig code, which links to a static library, also written in Zig. In this case, the library exports a panic handler with weak linkage, and the executable overrides it with strong linkage. Thus, as is intuitive, the Zig code providing the entrypoint also provides the panic handler.

Scenario 2: building a shared library written in Zig, which is used by C code. Here, the shared library exports a weak symbol, and the C binary which uses it naturally exports nothing, so the weak symbol from the shared library is used. The C code could provide a strong __zig_panic symbol if it wants; I'm unsure if this would be used in this case (how does the dynamic linker handle weak symbols?).

Scenario 3: building a static library written in Zig, which itself links another static library, also written in Zig. In this case, both handlers have weak linkage, so we get the "random" behavior discussed above. This, of course, assumes that the consumer of the library is compiled with a C compiler; if the final executable is built with Zig, then the executable overrides both panic handlers.

Scenario 4: building an executable written in Zig, with a C compilation unit which provides __zig_panic. In this case, the "unless another link object already provides it with strong linkage" rule kicks in, and the C definition overrides the Zig one. I think this is the intuitive behavior in this case.


The main thing I'm not certain of is how all of this interacts with shared libraries, since I don't know how they interact with weak linkage in general (are the links resolved at link time or load time?). @kubkon, could you shed some light on this?

I'm also unsure if weak linkage is an issue on any target. From what I can see, Windows is a little shaky on it, but does seem to support it in at least some sense -- this will be another @kubkon question I suppose!

kubkon commented 3 months ago

Whilst looking into porting the UBSan runtime to Zig, I recalled a conversation with Andrew about the fact that @panic ought to only result in a single panic handler across all compilation units in a binary. Ideally, it should be possible to both reference the handler from C code, and define it in C code.

The self-hosted compiler does not yet implement this behavior: uses of @panic just emit calls to std.builtin.panic. This issue details a proposal for how to implement this behavior.

The panic handler is provided by a function with the following signature (note that this depends on #17969):

/// The panic cause is unpacked into `cause`, `data0`, and `data1`; depending on `cause`, the data parameters may be ignored.
/// If there is no error return trace, then `error_return_trace.index == std.math.maxInt(usize)`.
/// If there is no return address, then `ret_addr == 0`.
extern fn __zig_panic(cause: @typeInfo(PanicCause).Union.tag_type.?, data0: usize, data1: usize, error_return_trace: CompletedStackTrace, ret_addr: u64);

const CompletedStackTrace = extern struct {
    index: usize,
    instruction_addresses: [*]usize,
};

Calls to @panic are translated into calls to this function, converting the PanicCause, ?StackTrace and ?u64 to the parameter types according to the doc comments.

The more difficult question to answer is which compilation unit provides this handler, and how. I propose the following rules.

* If `-fpanic-handler` is passed to a compilation, the panic handler is exported from the ZCU if Zig source files are provided. Otherwise, a single compilation unit containing only the panic handler is implicitly added to the compilation. (The resulting object can be cached globally.)

* If `-fno-panic-handler` is passed to a compilation, no panic handler is exported.

* Otherwise, we export the panic handler according to the logic from the `-fpanic-handler` case, but:

  * If we are building an executable, the panic handler is exported with strong linkage.
  * If we are building an object or library (shared or static), the panic handler is exported with weak linkage.

This system allows the user to set a preference using -f[no-]panic-handler if desired, but otherwise provides reasonable defaults. Let's analyze the possible failure cases:

* a binary has no panic handler

* a binary has multiple panic handlers, all with weak linkage

* a binary has multiple panic handlers, at least two with strong linkage

A binary having no panic handler is not possible under the default settings (it can only happen if -fno-panic-handler is passed to every compilation). In this case, a link error occurs if any CU references the panic handler, but it is due to explicit user overrides preventing successful compilation, which seems like a non-issue.

A binary having multiple panic handlers, all with weak linkage, can occur by default if the final executable is not built with Zig - for instance, if a library (static or shared) is built with Zig and linked into a C program which is compiled with clang or gcc. In this case, the linker will functionally choose a random implementation (I think it's technically the first one?). This isn't ideal, but isn't easily resolvable, because no compilation unit is more authoritative than any other in this context. In practice, it will be rare for libraries to override the panic handler, so all will probably have the default, meaning the "randomness" here is typically unobservable. If any CU provides a custom panic handler, then the user can set -fpanic-handler on that one in their build script to give it strong linkage and hence make it override the others.

A binary having multiple panic handlers, at least two of which have strong linkage, can never occur by default. It only happens when -fpanic-handler is passed to at least one compilation. The main case I see being confusing here is passing -fpanic-handler to a library, and linking it to a full executable, all with Zig: both will provide panic handlers with strong linkage, at which point we again defer to functionally random selection. This case is unfortunate: the user presumably expected the override to cause the panic handler to deterministically come from the library. For this reason, we augment the executable rule a little:

Unfortunately, this is not how it works. If the linker pulls in two input relocatable object files each exporting a symbol with strong linkage, it will generate a "duplicate symbol definition" error by default (please note this can be turned off for ELF, but this is always a hard-error for MachO, and in general, if happens, means there is a developer error that has to be corrected and not worked around). If however, there exists a relocatable object file that exports a colliding symbol with strong linkage but it is part of a static library, then it will only be an issue if it is pulled by the linker when resolving missing symbol references from other input relocatable object files that were specified explicitly on the linker link.

* If we are building an executable, the panic handler is exported with strong linkage, **unless another link object already provides it with strong linkage.**

This extra case removes this unintuitive behavior, whilst still making sure to provide a panic handler in all cases. It makes sure that doing everything with the Zig compiler is a happy case, where everything "just works".

Let us now look at the impact of this design in practice. I'll list a few scenarios, and explain how the logic would play out (assuming no -f[no-]panic-handler overrides).

Scenario 1: building an executable, from Zig code, which links to a static library, also written in Zig. In this case, the library exports a panic handler with weak linkage, and the executable overrides it with strong linkage. Thus, as is intuitive, the Zig code providing the entrypoint also provides the panic handler.

Why not simply always export the panic handler with weak linkage? Why bother with strong linkage at all? If a user wants to override this symbol they can set the export to strong themselves in their source code can they not? Having the panic handler always exported as weak would avoid a lot of issues I believe and would simply the design by reducing the number of rules required.

Scenario 2: building a shared library written in Zig, which is used by C code. Here, the shared library exports a weak symbol, and the C binary which uses it naturally exports nothing, so the weak symbol from the shared library is used. The C code could provide a strong __zig_panic symbol if it wants; I'm unsure if this would be used in this case (how does the dynamic linker handle weak symbols?).

Exported (weak) symbol from a shared library only impacts the image if this one is importing a symbol at runtime. Since in this proposal you said it is a hard-error to never have a panic handler available at link time, this is a non-issue - we never defer to runtime binding of a panic handler.

Scenario 3: building a static library written in Zig, which itself links another static library, also written in Zig. In this case, both handlers have weak linkage, so we get the "random" behavior discussed above. This, of course, assumes that the consumer of the library is compiled with a C compiler; if the final executable is built with Zig, then the executable overrides both panic handlers.

Combining two static libraries together does not perform any symbol resolution at all. If however you want to combine multiple input relocatable object files into a single relocatable object file (the infamous -r option), then symbol resolution will take place. In build.zig this would usually look as follows which to the best of my knowledge is not something folks use, well, except for me when testing the linkers:

const out = b.addObject(...);
out.addObject(a_o);
out.addObject(b_o);

Scenario 4: building an executable written in Zig, with a C compilation unit which provides __zig_panic. In this case, the "unless another link object already provides it with strong linkage" rule kicks in, and the C definition overrides the Zig one. I think this is the intuitive behavior in this case.

Again, if we always export as weak by default would make this situation crystal clear - unless the user forced exported their panic handler as strong, we pick any implementation unless __zig_panic is exported as strong in the C translation unit.

The main thing I'm not certain of is how all of this interacts with shared libraries, since I don't know how they interact with weak linkage in general (are the links resolved at link time or load time?). @kubkon, could you shed some light on this?

I'm also unsure if weak linkage is an issue on any target. From what I can see, Windows is a little shaky on it, but does seem to support it in at least some sense -- this will be another @kubkon question I suppose!

Windows doesn't have the concept of weak linkage. It has the concept of /alternatename:from=to however which can be used as poor man's alternative as far as I know 1. What happens then is that every translation unit we generate makes the panic handler symbol undefined but provides a fallback symbol in case the former is not found. This then allows the user to force the linker to never look for the alternative by simply exporting the panic handler symbol.

mlugg commented 3 months ago

Unfortunately, this is not how it works. If the linker pulls in two input relocatable object files each exporting a symbol with strong linkage, it will generate a "duplicate symbol definition" error by default (please note this can be turned off for ELF, but this is always a hard-error for MachO, and in general, if happens, means there is a developer error that has to be corrected and not worked around). If however, there exists a relocatable object file that exports a colliding symbol with strong linkage but it is part of a static library, then it will only be an issue if it is pulled by the linker when resolving missing symbol references from other input relocatable object files that were specified explicitly on the linker link.

Gah, of course -- I forgot how this worked, naively Googled it, and got the wrong answer, apologies.

Why not simply always export the panic handler with weak linkage? Why bother with strong linkage at all?

The goal was to make things "just work" in the most intuitive way possible when no overrides are given. In particular, when a Zig executable links to a library written in Zig, we probably want to use the executable's handler. However, the limitations you've raised make me agree that it makes sense to use weak linkage globally by default.

Perhaps when building Zig code, we can tailor the default linking behavior to prefer the weak symbol provided by this ZCU if it exists. Could we instruct the compiler to strip out any weak __zig_panic implementations provided by other link objects? I'm sure the linker could do this when linking an executable or shared library, but I'm not so sure about static libraries, although I don't see why we couldn't patch the object files a little. Regardless, the key question would be whether we included this detail in the language spec; if not, users relying on this behavior would be relying on implementation-defined behavior, which isn't ideal. Although, weak linkage would inevitably lead to that anyway, so I think specifying this behavior in the language spec would be reasonable, again with the goal of making things "just work" as much as possible when using the Zig compiler for the whole process.

If a user wants to override this symbol they can set the export to strong themselves in their source code can they not?

I should note that I wasn't expecting the standard library to itself export __zig_panic, but rather for the compiler to export this based on something in std.builtin. There isn't a particularly strong reason for this, it just seems a better fit to have the compiler handle it to an extent given that it's a part of the language spec. However, the point still stands -- they would just pass -fpanic-handler (or, realistically, set exe.panic_handler = true in their build script).

Exported (weak) symbol from a shared library only impacts the image if this one is importing a symbol at runtime. Since in this proposal you said it is a hard-error to never have a panic handler available at link time, this is a non-issue - we never defer to runtime binding of a panic handler.

Makes sense!

Combining two static libraries together does not perform any symbol resolution at all. If however you want to combine multiple input relocatable object files into a single relocatable object file (the infamous -r option), then symbol resolution will take place. In build.zig this would usually look as follows which to the best of my knowledge is not something folks use, well, except for me when testing the linkers:

const out = b.addObject(...);
out.addObject(a_o);
out.addObject(b_o);

Got it!

Again, if we always export as weak by default would make this situation crystal clear - unless the user forced exported their panic handler as strong, we pick any implementation unless __zig_panic is exported as strong in the C translation unit.

The problem with "pick any implementation" is that projects would come to unintentionally rely on the behavior of the canonical compiler implementation in practice, hence my suggestion above. If any form of overriding behavior is to exist, which is of course inevitable with weak linkage, I think it makes sense to at least specify the common case so people can trust that what works on one Zig compiler will probably work on all Zig compilers.

Windows doesn't have the concept of weak linkage. It has the concept of /alternatename:from=to however which can be used as poor man's alternative as far as I know 1. What happens then is that every translation unit we generate makes the panic handler symbol undefined but provides a fallback symbol in case the former is not found. This then allows the user to force the linker to never look for the alternative by simply exporting the panic handler symbol.

Okay, this makes sense. I haven't used MSVC #pragma comment before -- does it inject something into the COFF object giving this flag to the linker? Will the self-hosted linker support it okay?