ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.68k stars 2.53k forks source link

proposal: Non-zero integer types #3386

Closed tomc1998 closed 4 years ago

tomc1998 commented 5 years ago

Zig has optionals, which means any value you have always has a valid value associated with it. Pre-pending ? to the type means you're showing that this type might not have a valid representation - this is great!

A lot of work in zig is currently wrapping C libraries. C uses NULL for pointers, which is just a zero value. Zig is pretty smart here, and a ?*T will always have a 0 value when null. Also great, this allows for really neat interop with C without sacrificing zig's semantics of 'anything that's not an optional will always be valid'.

However, some C libraries involve integer handles, where that handle represents some resource. In most cases, if a handle can be 'null', the value for that handle is 0 (see OpenGL's handles for example). When wrapping this with zig, however, now we have values passed around which are non-optional, but are no longer always valid.

type MyCHandle = u32;
fn foo(some_handle: MyHandle) void {
    // Here, we don't know if some_handle is valid, and must check against zero!
    // This is a reasonable amount of mental overhead, and there's no longer
    // the safety of 'this value isn't an optional, therefore it must be valid'.
}

So the proposal is for non-zero integer types, like u32nz, i32nz, etc.

If you had an i32nz var, and you assigned it to zero, that'd be either a runtime or compile-time error (if possible to show at compile time ?). Since we can guarantee these aren't zero, however, ?u32nz would always be 0 when null.

I think this would really help for C library interop. For another example of a library I was looking into wrapping which uses 0 for null pointers, see https://github.com/SanderMertens/flecs

mikdusan commented 5 years ago

just to add a case here: a very popular C'ism is fd descriptors in posix are i32 with -1 being their invalid sentry

tomc1998 commented 5 years ago

I suppose an alternative is a builtin @inull(32, -1) or something, where you specify the null value as the second param, then just typedef u32nz to @unull(32, 0)?

JesseRMeyer commented 5 years ago

The definition of an optional doesn't appear well defined to me when interacting with non-Zig code. So I'm not sure that this is a language problem in of itself which requires more language features to solve. I think it's part of the cost of using different tools together at the same time.

I'd prefer Zig to be a simpler language by offloading many of the Zig <-> C interop decisions to the programmer at the boundary interface than to try to account for the various cultural conventions from C packages as a feature. So in this particular case, I think it's a fair trade to handle the 0 explicitly as part of properly handling between the boundary interface, because that's useful information about expectations from the other, non-Zig side, and keeps that information secluded from other reaches of the code.

tomc1998 commented 5 years ago

The problem is that this isn't limited to the boudnary where you interact with C, there's no way to nicely wrap this with an zig-like optional semantics without packing it with a bool. Either, when you use a c library like this, you compromise the whole codebase since now you have to make sure the value you're dealing with is definitely valid (despite it not being marked this way by the type system), OR make it a ?u32 which will pack it with a bool, such that it's no longer a zero cost wrapping.

This really undermines the value of optional, since without any edge cases like this, a function can always assume that any non-optional values are definitely valid, WITHOUT looking at the semantic meaning of each parameter. All of a sudden programmers need to know more about the context of a program (where this function will be called from, where the data has come from) in order to reason about the validity of a given value.

OR you just pack it with a bool, which is definitely less than optimal for when storing many handles in an array (especially because I imagine a ?u32 will actually end up being 64 bit? could be wrong here.)

On Sun, 6 Oct 2019 at 15:50, Jesse Meyer notifications@github.com wrote:

The definition of an optional doesn't appear well defined to me when interacting with non-Zig code. So I'm not sure that this is a language problem in of itself which requires more language features to solve. I think it's part of the cost of using different tools together at the same time.

I'd prefer Zig to be a simpler language by offloading many of the Zig <-> C interop decisions to the programmer at the boundary interface than to try to account for the various cultural conventions from C packages as a feature. So in this particular case, I think it's a fair trade to handle the 0 explicitly as part of properly handling between the boundary interface, because that's useful information about expectations from the other, non-Zig side, and keeps that information secluded from other reaches of the code.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ziglang/zig/issues/3386?email_source=notifications&email_token=AB3UXEJMW6L6TLTX3JFAR7LQNH3KVA5CNFSM4I53RT72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAOL2QA#issuecomment-538754368, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3UXEO3NXOJHMQJ3ST2YZTQNH3KVANCNFSM4I53RT7Q .

JesseRMeyer commented 5 years ago

you compromise the whole codebase since now you have to make sure the value you're dealing with is definitely valid (despite it not being marked this way by the type system), OR make it a ?u32 which will pack it with a bool, such that it's no longer a zero cost wrapping.

This should be an explicit concern of the Zig boundary layer and should not leak these concerns to any Zig code that calls it, preventing any compromises. This is the purpose of the boundary interface, butter your toast on both sides. =)

tomc1998 commented 5 years ago

This is my point, without this type you have to compromise either one of these - either you just make it a ?u32 and sacrifice performance, or you make it a u32 where 0 is implicitly null, OR you make a struct to wrap the u32 with an isNull() function - which STILL doesn't really obey the zig way, since you have to think 'hey is this .isNull or == null I should be using here?'

kyle-github commented 5 years ago

Just to be a bit pedantic, C does NOT require that NULL evaluates to a word with all zero bits. In fact there is considerable effort made to make sure that NULL values that are not zero can be used. It is commonly assumed that zero is special, but it is definitely not part of the spec. Having a special type that tries to optimize zero values is actually a not-so latent bug for C interop.

This is a bit like the old saying that you can write Fortran in any language. In this case, you can write C in Zig, but do you want to? If you make Zig behave like C, then all you have done is duplicate C. Zig has a lot more to offer than that...

I think there are assumptions being made that things like ?u32 will always cost extra flags, performance etc. Why assume that? Zig is still being created. Don't worry about microbenchmarks, but instead worry about figuring out what idiomatic and elegant Zig looks like and then worry about making those common cases highly performant.

If Java can be made fast, then Zig certainly can be made fast!

Premature optimization is the root of all evil. - Donald Knuth

Note that some languages like Pascal and Ada allow for integer types with constrained ranges that are not powers of two, i.e. 1..1000. This can be useful in some cases and is considerably more useful than just the natural numbers. That said, there is a noticeable lack of languages that support those kind of ranges today. So perhaps the usefulness is not actually that high.

Rocknest commented 5 years ago

@tomc1998

OR you just pack it with a bool, which is definitely less than optimal for when storing many handles in an array (especially because I imagine a ?u32 will actually end up being 64 bit? could be wrong here.)

If you need to save memory in an array you can encode optional integers as you like, but if you are passing them around as functions arguments there pretty much zero cost since on most 64bit architectures registers are 64 bit anyway.

kyle-github commented 5 years ago

This may be related to other C-interop here: #3328.

tomc1998 commented 5 years ago

@kyle-github Having a special type that tries to optimize zero values is actually a not-so latent bug for C interop Isn't this what types like ?*u32 already do? make null == 0? Regardless, I'm not talking about pointers here, but integers where a given value can be treated as null.

It would seem to be a reasonably common use case to have an array of nullable integers, where you don't care about the full range of those integer types, even if not for C interop. Anything where you're storing a sparse matrix in a dense array, chessboard, for example. Another example would be a minecraft world - currently that uses 0 as the 'no block' sentry.

To make this more generic, it'd be nice to be able to create a type where the null value was a given constant value of that type. For example:

const PieceWithNull = enum {
    Nothing, Pawn, Knight, Bishop, Rook, Queen, King
};
const Piece = @nullValue(PieceWithNull, PieceWithNull.Nothing)
// Piece == enum { Pawn, Knight, Bishop, Rook, Queen, King }

fn main() void {
    const piece : ?Piece = null;
    std.debug.warn("{}\n", piece); // Prints PieceWithNull.Nothing
}

This is less obviously bad to deal with when your domain of values is known and nameable, e.g. chess pieces. However, if you're creating some arbitrary dense data storage, and need to have a 'null' sentry, this is a really zig friendly way to go.

Rocknest commented 5 years ago

@tomc1998 non-exhaustive enums #2524, wrapped primitives #2953

andrewrk commented 4 years ago

Here's my counter-proposal, which supercedes this proposal: #3806