Proposal: Zig ABI for language specific features

suirad commented 4 years ago

During my experiment of implementing an error abi, I came to the conclusion that it should be do-able to add an ABI to zig to ease integration between zig-only projects. This also would expose libraries/objects of other languages to interact directly with zig specific features for better integration. While it could be a library, it would be most beneficial as an official ABI.

My proposal is to define an ABI that could be used by zig/other languages to better integrate between the module boundary. It certainly could be supported by usermode code in the std.

Below are language features that could be nice to be exposed by the abi:

errors (/sets/unions)
tagged enums(if the tag itself changes per compilation)
slices

tecanec commented 4 years ago

I agree heavily with this, and would even consider it a necessity. A fully Zig-compatible ABI is something we should have. As an example use case (if that's even needed), this allows the creation of dynamic libraries to split up an existing application and allow lighter patches without having to refactor the whole program to allow this.

A possible route for this would be to build on top of the C ABI by defining special cases without ruining compatibility. For example, it could define slices and tagged unions as structs or as multiple values. This has some advantages, such as making things simpler by not having multiple ABIs, maintaining C-compatibility even when using Zig-features (albeit not entirely seamless) and likely being easier to define and implement. One downside is that it won't allow the same level of optimization, but performance likely isn't prioritized if you're using dynamic libraries, anyway.

pixelherodev commented 4 years ago

I think that if we do define an ABI, it should be only for exports / externs. Otherwise, we lose a lot of the advantage of release modes.

ikskuh commented 4 years ago

I don't think it's possible to have an ABI for Zig without wasting a lot of potential optimizations. Imho we don't gain a lot by having an ABI.

So, before i start: First of all, everything that touches comptime is not ABI-possible. You take a anytype param? Not gonna have an ABI for that. You take a tuple as a argument? Not gonna have an ABI for that. Same for everything else that uses comptime parameters. The same is probably true for functions that are async.

"Pro":

A defined ABI would allow closed-source implementations of libraries with a small, prechosen set of supported platforms, destroying portability of those libs to other platforms (source based libraries, even if you may not change the source can be compiled to many more platforms than your vendor of binaries thought of, think arm-linux vs riscv-linx).
A defined ABI would allow to load shared libraries with a subset of Zig features, stripping out most of the useful stuff

Con:

You don't have transparent async. A function will either be async or not. Very likely to be not async in libraries as the functions cannot access your event loop without having a shared runtime library.
You don't have comptime, making a lot of code useless when trying to have an ABI
Error types will most likely be not possible without having severe performance hits. Error values must be negotiated between dynamic libraries, otherwise error.Foo and error.Bar may collide with the same error mapping. So loading a shared library means you have to relocate not only addresses, but also patch error values -> Zig would require to be minded in all shared library loaders on every platform. Not gonna happen. @suirad you're strategy of error hashing is likely to collide and there's no way to statically resolve this. It has to be a dynamic process.
Having an ABI enforces stuff like struct/union layouts. This prevents Zig for a whole class of optimizations currently possible due to having undefined struct layout. Simple example: Compiling for release-fast could align struct members such that they are aligned for the fastest memory access, release-safe can reorder the fields to be somewhat compact and fast, and release-small could enforce all struct to be packed struct, but with reordering, making the memory footprint of the program way smaller than ever possible in C/C++

I am strictly against defining an ABI for Zig programs. The language is meant to be compiled from source (which has a lot of important properties like maintainability, improved whole-program optimization, ...) and not having precompiled libraries, but let's consider we actually built an abi. What would we have to do?

First: An ABI is platform/CPU specific. Having a cross-cpu ABI makes no sense. So we would have to define a additional ABI in a document for Zig+CPU. Shared libraries using that ABI will only be compatible to other Zig programs and probably have no chance to ever interact with other languages but Zig (otherwise we could just use the C(ommon) ABI for that platform which is already possible).

Apart from that, what has to be defined?

struct layouts
layout for tagged enum
memory layout for slice types
alignment and size for all non-pow2-integers
layout for optionals
- All non-nonnull pointers are special cases here!
Sizes for untagged enum types
error value negotation
error union type (could just be a tagged union)
result location semantics
async semantics and frame layouts

So as you can see, i'm strongly against having an ABI for Zig. It would hurt the project both in performance, but even more in maintainability. Keep your interactions with the outer world to the C(ommon) ABI, so non-Zig projects will also profit from your libraries and pure-zig libraries can still be super-optimized compared to other native languages.

vi commented 4 years ago

Can Zig ever be a "main system programming language for a shared libraries-based platform" without such ABI?

Swift have done its ABI. Why not Zig?

Obviously, it is not possible to have a proper ABI without a compromise in maintainability and performance. But that performance hit is only expected to be around the ABI surface (which includes the types referenced by those functions), not around private functions.

Imagine a Zig operating system. How would it share code? How would it update WhateverSSL? How audio effects plugins would work?

The ABI doesn't needs to be called "a Zig ABI". I think there can be multiple ABIs with different compromises. This can be a .C ABI plus some additional rules for additional features. It can be a good idea to just support Swift ABI directly (as far as read somewhere, it is a nice work of engineering art).

suirad commented 4 years ago

@MasterQ32 While I agree with pretty much all of your points, I don't believe this proposal is advocating for as drastic as a change as you are addressing.

I may have been too vague in my description but my vision of the scope of this proposal is effectively the following attributes:

Only applicable during extern/export; this shouldn't change how anything else works internally to zig
Being during extern/export; it will also keep all existing limitations during that time
Its use in zig would effectively be a calling convention that is the C calling convention + extensions
Being based on the C ABI, other languages that want to have closer interop with a zig library could easily use it by just adding code for the extentions and in all other cases treat it just like the C ABI.
The extentions would facilitate interacting with zig features(i.e. returning errors)

The result of that would allow more Zig features to be available during export/extern; an example is the following code snippet would work:

export fn thing() callconv(.Zig) !void {
    //....
}

It was mentioned that this would conflict with how error unions work, however this could be handled differently/seperately(extern errorset?) from normal errors and I am not suggesting any specific implementation.

ikskuh commented 4 years ago

Okay, this sounds way better. So you actually want to define some type layouts for callconv(.C) functions, and don't build a fully featured Zig ABI. I would still stick to extern types then for struct / union / enum, define a memory layout for slices and tagged unions (with only external fields). I still don't think that errors will work though. Not even in a technical, but in a theoretical sense. As ABIs require hard values and errors are a set of values defined by integers, but errors are equal-by-name which makes it hard to define an ABI for that. It isn't done by assigning each error a unique number as these numbers may differ between libraries!

matu3ba commented 2 years ago

errors (/sets/unions)

This requires to define an enum for functions. If it is an ABI, it must be able to set from the external program or be reusable in extern programs by a given layout. This would look something like this

struct errorset {
  uint8_t* fnname;
  uint32_t len_err_int;  // must have maximum feasible size + a forward-compatibly way to increase it
  uint32_t* err_int; 
  uint8_t* str_err;
  uint32_t* err_int;
  uint32_t* lens_str_err;
};
fnname_err = "fnname";
err_int = [0, 1, 2];
str_err = "ERROR1ERROR2ERROR3"  // or something nicer for user editing, but I chose this for compactness
offsets_str_err = [0, 5, 10];
lens_str_err = [5, 5, 5];

and would require to take this into account during error set resolution with different passes for 1. computing the error set of the fn and compare it against the provided one and 2. reusing the given one(s) for further computations. To me this sounds at least more complex code and probably a (significant) perf cost.

Even, if its only for Zig code (which imho would be not nice), the information must be stored and handled in a similar fashion.

slices

Slices are only ptr + length, but length may change depending on the target symbol layout. As ABI stuff must be usable by the linker, this requires to have adaptive symbol sizes. If its unrelated to linker stuff, the underlying fields+length must be encoded in the library somewhere and picked up by the Zig compiler.

tagged enums(if the tag itself changes per compilation)

same story: linker must be able to adaptively link stuff or underlying fields+length must be encoded in the lib and picked up by compiler.

Do I understand linker and abi behavior wrong here or what the proposal says?

iacore commented 2 years ago

Optional types like ?i32 need stable ABI too, also ?[*c]i32 (0 is different from none).

If would be nice if ABI for pass-by-value for struct is defined, unlike C. A stable ABI would ensure that compiler for another language don't need to know about Zig's details.

We might as well be compatible with C for the most part.

glyh commented 1 year ago

Adding to this list I wish other language could access zig's comptime feature, I don't know how hard it would be to implement that but I'm willing to help make it possible if there's ever interests on that. Main reason for my side it's that I want to design a language that integrate tightly with Zig but abstract away some details for writing application level codes.

I saw people claim this is not possible but is it possible if the other language specifically design their language and base their code on Zig? Is that more of an ABI thing or a compiler hook thing?

Khitiara commented 8 months ago

heres my two cents on how to implement error sets for what its worth

exporting zig library provides a "zig header" file defining externs for all functions, as well as defining explicit error sets for each function and an enum with explicit values for each such error set. for a given function the generated ABI is to add shadow *T and *StackTrace parameters and returns the enum associated with the error set for the function, and calling zig code uses the equivalent of an inline switch along with result location semantics to marshal the enum and pointer back to an error (as well as passing the stack as per the implementation of error return tracing as described in the langref) and the exporting code uses a wrapper that does the same marshalling in reverse. apart from the stack trace bit this is doable if painful in current zig:

pub const Fn1Errors = error { out_of_memory, buffer_too_small, invalid_parameter };

pub const Fn1ErrorEnum = enum { success, out_of_memory, buffer_too_small, invalid_parameter };

pub fn fn1(a: int) Fn1Errors!int { ... }

export fn fn1Wrapper(out: *int, a: int) Fn1ErrorEnum {
    out.* = fn1(a) catch |e| switch(e) {
        error.out_of_memory => return .out_of_memory,
        error.buffer_too_small => return .buffer_too_small,
        error.invalid_parameter => return .invalid_parameter,
    };
    return .success;
}

pub const Fn1Errors = error { out_of_memory, buffer_too_small, invalid_parameter };

pub const Fn1ErrorEnum = enum { success, out_of_memory, buffer_too_small, invalid_parameter };

extern fn fn1Wrapper(out: *int, a: int) Fn1ErrorEnum;

pub fn fn1(a: int) Fn1Errors!int {
    int out = undefined;
    switch(fn1Wrapper(&out, a)) {
        .success => return out,
        .out_of_memory => return error.out_of_memory,
        .buffer_too_small => return error.buffer_too_small,
        .invalid_parameter => return error.invalid_parameter,
    }
}

pub const foo = @import("import.zig");

fn main() !void {
    _ = try foo.fn1(0);
}

the primary issue preventing implementing the wrappers with comptime for me is im not sure how to get an error by name - i think the actual strings can be elided if its done in an extern switch and using whatever builtin would get the errors by name.

ideally a true abi implementation would also forward the stacktrace for error return tracing but i dont think thats doable in user code rn

Khitiara commented 8 months ago

reply to the above: apparently @field works for errors and my proof of concept can be updated to:

pub const Fn1Errors = error { out_of_memory, buffer_too_small, invalid_parameter };

pub const Fn1ErrorEnum = enum { success, out_of_memory, buffer_too_small, invalid_parameter };

pub fn fn1(a: int) Fn1Errors!int { ... }

export fn fn1Wrapper(out: *int, a: int) Fn1ErrorEnum {
    out.* = fn1(a) catch |e| switch(e) {
        inline else => |e2| return @field(Fn1ErrorEnum, @tagName(e2)),
    };
    return .success;
}

pub const Fn1Errors = error { out_of_memory, buffer_too_small, invalid_parameter };

pub const Fn1ErrorEnum = enum { success, out_of_memory, buffer_too_small, invalid_parameter };

extern fn fn1Wrapper(out: *int, a: int) Fn1ErrorEnum;

pub fn fn1(a: int) Fn1Errors!int {
    int out = undefined;
    switch(fn1Wrapper(&out, a)) {
        .success => return out,
        inline else => |e| return @field(Fn1Errors, @tagName(e)),
    }
}

pub const foo = @import("import.zig");

fn main() !void {
    _ = try foo.fn1(0);
}

i believe the @tagName and @field combine to elide requiring the string literal to be in the output binary but im not certain

Pyrolistical commented 5 months ago

Why this proposal is important? It is required for Zig to eat the world.

There is a principle when dealing with legacy code.

The legacy system calls into the new system. The new system does not know about the legacy system. If the new system knows about the legacy system, your new system will eventually become the legacy system. People are lazy. If the new system has an escape hatch to the legacy system, it is always easier to make the new system wrap the legacy system. You can never escape the legacy system this way.

Applying this idea to ABIs, Zig has done an great job making it easy to call into other systems using the C ABI, but this means we can never escape the C ABI.

With a well defined Zig ABI, we could write a C library that make Zig ABI calls. This would allow us to maintain a Zig codebase with only exports and eventually replace non-Zig codebases with Zig.

silversquirl commented 5 months ago

With a well defined Zig ABI, we could write a C library that make Zig ABI calls. This would allow us to maintain a Zig codebase with only exports and eventually replace non-Zig codebases with Zig.

This is already possible. You use the C ABI for it :)

Fact of the matter is, every language supports C ABI. It is the lingua franca of programming. It may not be perfect, but it is already well established and well supported, and that's more important. Even if Zig did decide to have a concrete ABI for some of its features (eg. slices), it would likely be defined in terms of the C ABI, otherwise no other language would be able to use it, defeating the entire point.

leecannon commented 5 months ago

I can't think of any zig language constructs other than slices and tagged unions that even can be defined as part of an ABI.

Comptime can't be defined so all generics are out the window, with how errors currently work they would need some form of runtime "relocation" like thing, etc.

Zig is a source first, single compilation unit language.

nicopoulos commented 4 months ago

Fact of the matter is, every language supports C ABI. It is the lingua franca of programming. It may not be perfect, but it is already well established and well supported, and that's more important.

But shouldn't the goal be to push for Zig to become the new lingua franca? The Language that operating systems and their APIs are written in for programs to interface with? The ABI that basically every higher level language supports?

I know, I'm probably just way too naive to think that that would ever happen with all the work that this would take, but I just can't imagine we'll still be writing C for the next 200 years...

ziglang / zig

Proposal: Zig ABI for language specific features #3786