ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.83k stars 2.55k forks source link

Add new builtin: @typeId #19858

Open ikskuh opened 6 months ago

ikskuh commented 6 months ago

Add a new builtin called @typeId:

@typeId(comptime T: type) usize

This builtin returns a unique integer for each type passed, and will return the same integer for the same type.

The return value must not be consistent inbetween builds, so a second build might return completly different numbers for the same types.

An alternative variant might return u32 or u64 to have a stable interface between different platforms.

Use cases

Prior art:

User-land implementation

The following version is runtime only, as we can't perform intFromPtr at compiletime:

fn typeId(comptime T: type) TypeId {
    const Tag = struct {
        var name: u8 = @typeName(T)[0]; // must depend on the type somehow!
        inline fn id() TypeId {
            return @enumFromInt(@intFromPtr(&name));
        }
    };
    return Tag.id();
}
silversquirl commented 6 months ago

One small request: it would be really nice if this returned u32 or another smaller integer type, instead of usize

sno2 commented 6 months ago

It could also take inspiration from the @intFromEnum behavior and return an integer denoted as anytype which may have the smallest integer type possible. This leaves room for shrinking the return type in the future because I'm not sure how/if this would affect incremental story.

SuperAuguste commented 6 months ago

There is actually an at-comptime solution for typeId that works today but it is absolutely horrible:

fn typeId(comptime T: type) u32 {
    return @intFromError(@field(anyerror, @typeName(T)));
}

Bad status quo solutions have helped back changes such as this one before so just wanted to share. :)

mlugg commented 6 months ago

@sno2 Using RLS here is a bit tricky. There are two options:

If this proposal is accepted, I definitely think the returned integer should have a fixed size (probably 32 bits). 32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, but 64 (or usize) is very excessive (in fact, the canonical compiler implementation won't let you have more than 2^32 distinct types today, and I am beyond certain nobody will ever hit this limitation).

sno2 commented 6 months ago

@sno2 Using RLS here is a bit tricky. There are two options:

  • The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!

I was more thinking of the compiler counting how many types we have defined and log2 that into an integer type as the TypeId integer type. Although, this is now seeming like a hard task with many different edge cases depending on when you call @typeId so I think using u32 as you said is the best option.

Also, possibly useless side note but Rust's type id uses a u128 but I wasn't able to find any reasoning or investigations into shrinking it anywhere.

rohlem commented 6 months ago

32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, [...]

I can't think of a use case I would ever use this builtin for (it's a bit at odds with my fundamental design philosophy), but for everyone here who seems to have use cases: Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program? Think of every

That multiplies into a really big number.

I would expect many use cases actually only use this builtin when serializing a rather small set of types (and maybe their fields' types, recursively) over particular interfaces. Therefore it might be less work for the compiler to only assign ids to types that have been passed to @typeId. Doing that, maybe a u8 would be enough for some use cases, and we can guarantee only code which uses this feature "pays" for it (assuming this isn't something we already get for basically free via the intern pool - which it might be).

(Then again, maybe this is more of an ergonomics feature than a performance-oriented one? Plus, deduplicating ids in userland is also possible, even if it poses some of the same global-type-list challenges this proposal fundamentally tries to move into the compiler.)

likern commented 6 months ago

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

pfgithub commented 6 months ago

The one reason I've wanted it in the past is for safety on *anyopaque. Assuming @typeName is guaranteed to be unique, it can be used instead.

const AnyPtr = struct {
    type_id: usize, // alternatively, [*:0]u8
    ptr: *anyopaque,
    pub fn from(item: AnyPtr, comptime T: type, ptr: *T) AnyPtr {
        return .{
            .type_id = @typeId(T), // alternatively @typeName(T).ptr
            .ptr = @ptrCast(@alignCast(ptr)),
        };
    }
    pub fn readAs(item: AnyPtr, comptime T: type) *T {
        if(item.type_id != @typeId(T)) unreachable; // alternatively `item.type_id != @typeName(T).ptr`
        return @ptrCast(@alignCast(item.ptr));
    }
};
ikskuh commented 6 months ago

Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely.

Basically type checking (see linked any-pointer project) when doing type erasure, then see #19859 where you need to store a user-defined type in a non-generic datastructure (think HashMap(TypeId, *anyopaque), you can implement RTTI with, ...

SuperAuguste commented 6 months ago

As @MasterQ32 indicated in the original issue, his any-pointer is a great example of an exact usecase, but another explicit example of where @typeId would also be useful is in an ECS. The hack I shared above was actually created while attempting to solve an enum identification system with @slimsag where values could be detached and reattached from their respective enum types to identify components and events and store their identities easily.

Of course, all of these issues can be solved with userspace hacks but:

To not use any hacks while obtaining unique type identifiers, you can do something like this, but:

In my opinion, any sort of RTTI-ish solution would greatly benefit from this builtin. I imagine Felix sees it the same way, thus why he opened this issue.

About implementation details @rohlem, check out my PR to see how easy it is to implement from the InternPool. In short, the InternPool stores types (and other deduplication-dependent data like default values, memoized calls, etc. though this is not important for this explanation) by inserting them into a std.AutoArrayHashMapUnmanaged(void, void) which produces a single, unique InternPool.Index (a u32-backed enum) which we can then reuse for @typeId. If, understandably, the compiler folks wouldn't like exposing InternPool indices directly, we could simply create a second map, a std.AutoArrayHashMapUnmanaged(InternPool.Index, void), which would also be relatively inexpensive.

yzrmn commented 6 months ago

I think #5459 is directly related (and solved by #19861).

greytdepression commented 6 months ago

@rohlem Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program? Think of every

* integer type, signed and unsigned,

* enum, union

* pointer with and without mutability, for every alignment

* arrays with different lengths

* optional

This made me wonder about how the technical implementation would solve something like this (I'm pretending like @typeId just returns a usize for convenience here. You could insert any necessary @intFromEnum or whatever to make it work)

fn SelfReferentialStruct(comptime T: type) type {
    return struct {
        const Self = @This();
        const array: [@typeId(Self)] u32 = undefined;
    };
}

edit: Nevermind. I just checked and there already is a check for similar transitive failures in the compiler. :)

likern commented 6 months ago

Am I correct that the idea os basically to split pointer to T to void pointer and type separately, and to identify type to use it's unique identifier?

If that's correct, that's very interesting feature. But I would like to extend it even further. If it's stored separately, we can save this information to disk and restore back. But only if we have stable guarantee not only within one build.

Do I'd to take into account this feature too with this proposal. Where this might be useful? I think in databases, where there information about types is stored on disk, like in PostgreSQL where is used Oid type which uniquely identifies almost any object in database - types, attributes, tables, etc.

SuperAuguste commented 5 months ago

After sharing my first terrible comptime typeId hack, I'm back for more:

pub fn typeId(comptime T: type) u32 {
    _ = T;
    const fn_name = @src().fn_name;
    return std.fmt.parseInt(u32, fn_name[std.mem.lastIndexOfScalar(u8, fn_name, '_').? + 1 ..], 10) catch unreachable;
}

This one even exposes the InternPool.Index of the memoized call - enjoy! :^) (this one is runtime only though :()

silversquirl commented 5 months ago

stable guarantee not only within one build

This is not practical or even really possible. You can already assign explicit IDs to types manually, through a variety of methods, which is a much better option for serialization usecases.

michaelbartnett commented 5 months ago

@rohlem:

Do you expect to use it for all/most types that appear anywhere (including intermediately) in your entire program?

I don't necessarily care about having an ID for every single type in the program, because there are also going to be lots of comptime utility types for which it's not necessarily helpful to have an ID for. And I definitely don't need IDs for almost all of std.

For the specific examples you cited, yes I would want all power-of-two sized integer types signed and unsigned (and some non-power-of-two, but not the whole 64k spectrum, that'd be a little crazy). Var and const pointers and slices for any used type, yes, arrays of varying lengths yes, optionals definitely.

A u8 is definitely not enough to cover my needs, I can easily see needing at least a few thousand IDs to cover all the type variations (working from an "unadorned" set of 200-300 declared types). These would mainly be keys in tables, so even if they were pointer-sized that'd be fine by me. If I want a compressed type ID for sending over the wire or for cutting down on memory usage, I can do that in userspace on top of a builtin @typeId like you said.

(Then again, maybe this is more of an ergonomics feature than a performance-oriented one?)

This is definitely about ergonomics and "work scaling" (as in, scaling how many people are working on a codebase and reducing LOC needed to implement functionality) for me. A chonky type ID is the cost of doing business, so if it's straightforward to just use an ID from the InternPool and it's always u32 or u64 or w.e., that totally serves my needs. Using something abnormally large like u128 seems excessive (is Rust just using UUIDs? mehhhh) but I'd learn to live with it.

The biggest deal to me is ensuring it's consistent between comptime and runtime. With the current method of taking the address of storage you have to add a wrapper function to not mess that up, it can be tricky if you're not just directly using someone else's userland typeId.

michaelbartnett commented 5 months ago

For ref, here's how I handle keeping runtime and comptime consistent (the type id stays a pointer, and I have an extra enum type to convert to/from for smuggling purposes):

pub const TypeID = *const Type;

pub const TypeIntID = enum(usize) {
    invalid = 0,
    _,

    pub fn from(tid: TypeID) @This() {
        return @enumFromInt(@intFromPtr(tid));
    }

    pub fn toTypeID(self: @This()) TypeID {
        return @ptrFromInt(@intFromEnum(self));
    }
};

pub const Type = opaque {
    pub const id: fn (comptime T: type) TypeID = struct {
        inline fn tid(comptime T: type) TypeID {
            const TypeIDSlot = struct {
                var slot: u8 = undefined;
                comptime {
                    _ = T;
                }
            };
            return @ptrCast(&TypeIDSlot.slot);
        }

        fn typeID(comptime T: type) TypeID {
            return comptime tid(T);
        }
    }.typeID;

    pub fn toIntId(self: *const Type) TypeIntID {
        return TypeIntID.from(self);
    }
};

I've been using it for a while, I think InK or Vexu helped me arrive at this based on ikskuh's typeId.

Snektron commented 5 months ago

The return value must not be consistent inbetween builds, so a second build might return completly different numbers for the same types.

Is the intention that the returned value is comptime or runtime? If the former, this essentially introduces undeterminism in the type system. I'd suggest to make the ID at least stable between compilations of the same source code, to ensure reproducibility of builds, and not forcibly require randomness here for debugging purposes.

sibkit commented 3 months ago

@sno2 Using RLS here is a bit tricky. There are two options:

  • The same integer value is returned regardless of the type. In this case, that integer must fit in some minimum size (e.g. u32), and so there is no difference to just returning that size integer!
  • The integer value differs based on the result type. I presume this is what you intend. The issue here is that it's a bit of a footgun; if you accidentally pass the result type as u32 and upcast to u64, or vice versa, you might accidentally get a different ID to a different part of your codebase! This would probably lead to quite tricky bugs.

If this proposal is accepted, I definitely think the returned integer should have a fixed size (probably 32 bits). 32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, but 64 (or usize) is very excessive (in fact, the canonical compiler implementation won't let you have more than 2^32 distinct types today, and I am beyond certain nobody will ever hit this limitation).

But what about microcontrollers, for example 8 bit? usize is universal.

michaelbartnett commented 3 months ago

@sibkit

I dare you to measure it, less than usize will not stand out

Ehhhh I'd be careful about the absolute language here. 64bit vs 32bit identifiers can totally show up in memory profiles depending on how they're used.

Common memory optimization pattern: given two 32bit identifiers comprising some sort of combined category+id tag value, if you are reasonably certain you can mask & shift them onto a single 32bit field, and you have tens of thousands of instances of structs containing one or more of these, you're looking at multiple megabytes of memory saved. I care about saving multiple megabytes, and I'm not even writing for microcontrollers. Sometimes you need to save memory, and good targets aren't even the things that take up the most memory overall, but can give you a short enough haircut to meet your target without having to suffer other tradeoffs.

If we know the type ID value space is dense and predictably fills up from the bottom, those kinds of optimizations are possible even if the width is usize. But I'd be inclined to treat the value space here as a black box unless somebody on the core team says otherwise.

ikskuh commented 3 months ago

But what about microcontrollers, for example 8 bit? usize is universal.

usize on AVR should be 32 bit, while register types would be 8 bit and @sizeOf(*void) should be 16 bit

Snektron commented 3 months ago

After some extra thinking, Im completelt against possible nondeterminism at compile time. For a runtime value I think that it's fine provided that the result is stable across compiler runs (ideally even across different computers).

It would be fine if the ID can be spec'd somehow, but that would probably be quite hard / can be done just as well in user code. I suggest to make the return value runtime instead to avoid that issue.

michaelbartnett commented 3 months ago

@Snektron Making the return value of a this builtin runtime-only would hinder major use cases for this feature which are currently already possible via userland hacks.

My need for this feature is to be able to store type IDs at comptime while building lookup structures and then use those values later to cross-reference types and as keys in hashtables. Ideally I'd also like to be able to create lookup tables at comptime.

If this restriction could be: the actual typeID values can be stored and compare for equality but are otherwise opaque until runtime, that's a restriction I could live with (that's the status quo for my current typeID hack), provided the stored typeIDs are fixed-up in the final stage of compilation to avoid the case where typeID equality stays consistent between comptime and runtime. It's easy to break that property currently with the userland hacks if you aren't careful.

I can understand not wanting non-deterministic compilation. Where would the non-determinism come from when allowing these values to exist at comptime?

Snektron commented 3 months ago

If the value is determinstic across runs, compiler versions, etc, I dont have a problem with it. In Auguste's proof of concept, the value is derived from a compiler-internal type database key, and is depended on the order that the compiler processes the code. This is an implementation detail. My main concern is that when the compiler is parallelized further, this processing becomes unstable.

@ikskuh pointed out to me that intFromError has the same issue (and this is also what makes the current hack work), and I think that's a problem too.

michaelbartnett commented 3 months ago

That makes sense. I guess incremental compilation would also introduce non-determinism. I wouldn't really care about IDs shifting around between subsequent builds of the same source, but like I said I can see how that's a desirable property. I wonder how many uses of @typeName would have the same issue.

What about the constraint I mentioned: typeIDs can be checked for equality at comptime, but otherwise can't be observed or compared--including logging out the std.fmt representation, which could just be an opaque thing like "TypeID(comptime)". Equality is the primary thing that matters, as well as knowing that if I store one that the equality will stay consistent between comptime and runtime. Sort of similar to how @intFromPtr currently works.

That would be a big loss since it would mean we can't hash them at comptime, but it would at least provide a "blessed" version of the current hacks which wouldn't be at risk of breaking since it would be an explicit language feature.

ikskuh commented 3 months ago

@Snektron Making the return value of a this builtin runtime-only would hinder major use cases for this feature which are currently already possible via userland hacks.

...

If this restriction could be: the actual typeID values can be stored and compare for equality but are otherwise opaque until runtime, that's a restriction I could live with (that's the status quo for my current typeID hack), provided the stored typeIDs are fixed-up in the final stage of compilation to avoid the case where typeID equality stays consistent between comptime and runtime. It's easy to break that property currently with the userland hacks if you aren't careful.

I really love that. It has the best of both worlds:

nektro commented 3 months ago

If this restriction could be: the actual typeID values can be stored and compare for equality but are otherwise opaque until runtime

type already supports == ? runtime-only means this fits better in the stdlib (if anywhere) and not a builtin

michaelbartnett commented 3 months ago

I need to compare and store those identifiers into const decls at comptime and then compare them later at runtime, so comparing type equality doesn't work. I need a consistent correlation that goes across the comptime/runtime boundary.

Emmmmllll commented 3 months ago

At first sight Rust's approach of using u128s with a cryptographic generation (basically uuids) might seem overkill. But there are also cases where you might want this: Suppose you are writing a ECS library which is later linked to your application. If the library keeps track of its Data with comptime generated ids we are going to have a problem: there might be internal structs with a type id in the library which the compiler can not know of leading to ambigous type ids. In This case UUIDs might come in handy since it's way less likely of getting a type collision. Though I would still choose another approach: My idea is to add in fact 3 more builtin functions:

1. @genericId()

This one works in combination with a generic function (generator function). It should result in each version of this generic function getting a different Id. e.g.

pub fn type_id(comptime T: type) usize {
    return @genericId();
}

assert(type_id(u32) != type_id(*const bool))
assert(type_id(u32) == type_id(u32))

this may expand to

pub fn type_id(u32) usize {
    return 0;
}
pub fn type_id(*const bool) usize {
    return 1;
}

In regard of the return type, the amount of different versions version_count of the generic function can be counted and the return type is infered with a minimum of log2 version_count of bits. This way it is possible to have low memory usage of type ids if you work with embeded systems and you need only a few type ids. Furthermore this allows type ids to be scoped. Meaning the generator function describes the meaning of the type id and only this generator function should be used for this single purpose.

// type_id generator function added in the std
std.builtin.type_id(u32);
// this may be the same value since they implement two different generator functions
myLib.type_id(bool);

// with
const myLib = struct {
    // use u8 for my very efficient embeded library
    pub fn type_id(comptime T: type) u8 {
        return @genericId();    
    }
};

If any other generator funtion is used there will be an overlap, but this is the users fault. If you want a global (default scoped) type id, just add a generator function to the std and use this.

2 @useVal(comptime t: anytype)

This one might not be necessary but if generic funtions are lazy and do not generate different versions if T of the generator function is not used or if T is disgarded via _ = T then @useVal(T); would force the generic function to generate a version for each different T which is passed to it. Hence the real implementation would look like

pub fn type_id(comptime T: type) usize {
    @useVal(T);
    return @genericId();
}

Still, this is only needed if this does not work

pub fn type_id(comptime T: type) usize {
    _ = T;
    return @genericId();
}

3. @UUID()

Afaik we still have no possibility of generating uuids at compile time and this would come in handy in some ways (e.g. if you want to be pretty sure there are no type collisions with external libraries, if you need an random number at comptime, etc.). This builtin function should work again via inline but can also be used outside of generic functions to create a uuid without any type context and there need to be no scopes since they would not matter anyways. In terms of the return type i am still not sure which one would suit it the best but in my examples i am just using u128.

pub fn type_uuid(comptime T: type) u128 {
    @useVal(T);
    return @UUID();
}

const program_uuid = @UUID();
assert(type_uuid(u42) != type_uuid(i37));
assert(type_uuid(u16) == type_uuid(u16));

Now to solve the hypthetical issue of the linked ECS library you could either use the uuid approach or expose a type id generator function implementation in your zig binding to that library which adds the amount of internal types to the type id value to avoid any collision.

A last thing which would be nice to have, is a way to figure out how many versions of a generic function are actually generated.

I know adding 3 new builtin functions for a single problem would be a little bit over the top, though i hope this idea serves to find an optimal solution while maintining the philosophy of zig:

* Only one obvious way to do things.
* Incremental improvements.
* Reduce the amount one must remember.
* Focus on code rather than style.
...
engusmaze commented 1 month ago

Generating type ids using hash functions is definitely the way to go, because we get a stable interface between platforms. So, for example, if you export your library as a dynamic one, you can make an interface that provides a type id, and these type ids will match between different libraries.

Here is the code I used:

fn typeId(comptime T: type) u128 {
    const Type = struct {
        const id: u128 = result: {
            const a = std.hash.Wyhash.hash(3832269059401244599, @typeName(T));
            const b = std.hash.Wyhash.hash(5919152850572607287, @typeName(T));
            break :result @bitCast([2]u64{ a, b });
        };
    };
    return Type.id;
}
ikskuh commented 1 month ago
fn typeId(comptime T: type) u128 {
    const Type = struct {
        const id: u128 = result: {
            const a = std.hash.Wyhash.hash(3832269059401244599, @typeName(T));
            const b = std.hash.Wyhash.hash(5919152850572607287, @typeName(T));
            break :result @bitCast([2]u64{ a, b });
        };
    };
    return Type.id;
}

That code isn't portable between projects, as @typeName depends on the module structure. If you change a single module, it will rename the types and this change the hashes

engusmaze commented 1 month ago

That code isn't portable between projects, as @typeName depends on the module structure. If you change a single module, it will rename the types and this change the hashes

It doesn't depend on the structure of your project, but on how other developers have structured their libraries. @typeName(T) => "{filename}.{export name}" or add args if it's a function. It's not the best, because it doesn't include the library name, but on the other hand, if we were to hash the representation of a type, we would get the same value from different types.

const Test1 = struct {
    value: u64,
    pub fn doSomething1() void {}
};
const Test2 = struct {
    value: u64,
    pub fn doSomething2() void {}
};

These types have the same structure, so by hashing the type representation we would get the same hash, but these are different types. @typeName solves this, but has its drawbacks. If Zig would also include the library name it would be better.

michaelbartnett commented 1 month ago

Using a hash of the @typeName as an ID can be useful in some circumstances, but is a non-starter for me due to the fact that it is possible to have two distinctly declared types have the same @typeName.

Imagine how many conflicts with math.Vec3 there could be. Some Vec3s may be auto layout, some may be extern, some may contain a @Vector field which would mean different alignment requirements. I'm also not sure that @typeName can't arbitrarily start truncating if they are nested very deep and are built from many type functions.

Additionally, there may be library name conflicts, which may be resolved by giving dependencies or even the module imports different names in your build.zig.zon and your bulid.zig, so perfect cross-library type sharing can't be achieved solely through @typeName even if the library or dep name were to be incorporated into the output. Or say we put the zon dependency hash in there, how do you know the build configuration for a dynamically linked library produced a compatible type layout?

The use cases I have require reliably unique type IDs within a single compilation unit independent of file structure or build configuration. I also do use some @typeName hashing in the way you describe but with the understanding that it's not guaranteed to be reliably consistent.

I think the most reliable way forward for type compatibility/reconciliation across artifacts like you're looking for is probably a convention where you put some kind of UUID pub decl in the type or an exported function that returns a type's cross-lib ID or lets you look up cross-lib IDs (like COM but less irritating), and then that can be associated with the value produced from the proposed @typeId in a runtime type table mapping {artifact type UUID, other-lib's @typeId} -> {your artifact's @typeId}.

engusmaze commented 1 month ago

Another way of generating type id is by hashing the representation of a type @typeInfo(T), for which I have written a code that computes an id of (almost) any type by hashing its representation in comptime. It's pretty huge and ugly, but if someone wants to have cross-library communication, it works perfectly in describing your struct as a type id.

But I would argue that this is not the way to create actual type ids, because for example creating structs with the same field layout but different names and functions would give the same type id. On the other hand, if we had combined the hash of a library with the hash of a struct and its relative path to the root file, we would get ids that are more unique in a sense.

Another way to generate type ids is to keep track of the number of types currently being created by a compiler, I would call this a local id because it's local to your project and would have a different id in another project. This would not work for cross-library communication, and I would argue that you don't really need it. Instead, we can simply create an enum for the types we want to identify at runtime, and have it stored somewhere in struct. Or, if we want to automatically generate such ids during comptime, we would need global comptime variables to keep track of the ids (which we currently do not have).

Why u128: Rust has had a debate about collision resistance of TypeId's for a very long time and still has one, and some people have actually had collisions. They had recently moved from using u64 inside TypeId to u128.

To create TypeId's, Rust uses the following:

They've never actually used field types to generate their type ids. We can test this by creating a struct, printing its type id, then changing the type of any field and printing the type id again, we should get the same id between compilations.

michaelbartnett commented 1 month ago

After working on a codebase which necessitates local type IDs in the range of 500-1000+ distinct types with IDs, I have to strongly disagree that there's no value in the local type IDs. Local IDs are what the proposal here was explicitly discussing in the first place. If it's under 100 types I'd agree a manually maintained enum is good enough, but any more than that you are going to be more prone to maintainability issues, especially on larger teams.

I agree checking the structural compatibility of types with hashing is not really the desire here, although it's useful on its own for different purposes like caching.

Again, just to be clear: the scope initially presented here is explicitly not globally unique cross-build immutable type identifiers a la COM. The desire here was for a local type ID for uniquely identifying types within a single compilation unit, and for possibly building cross-compilation-unit runtime type identifiers if you combine it with an identifier for the compilation unit a type originates from.

Given that Rust's attempts to globally uniquely identify every nominal type still results in some conflicts and associated hand wringing, it seems to me that it's probably not worth pursuing vs local type IDs.

engusmaze commented 1 month ago

If we had global comptime variables, this wouldn't be an issue. We don't need to count all of the types to identify the specific types we use. If you're talking about ecs, this is probably also the case, in ecs we don't have to worry about types we don't use inside ecs, such as std types or other types outside the project. It's just that managing them via enum is not possible.

andrewrk commented 1 month ago

you're not getting comptime variables

mlugg commented 1 month ago

I'm also not sure that @typeName can't arbitrarily start truncating if they are nested very deep and are built from many type functions.

Not quite truncating, but deeply-nested values within type names are replaced with ..., yes.

Never rely on the output of @typeName: its return value will more than likely end up being implementation-defined (if @typeName isn't removed from the language altogether, that is).

If we had global comptime variables [...]

Yeah, no. These are rejected for a reason -- actually, for a lot of reasons. We're not sacrificing parallel compilation, incremental compilation, and a simple language specification, so that you can write messy and unintuitive comptime logic.

Another way to generate type ids is to keep track of the number of types currently being created by a compiler

(and related discussion)

The draft PR implementation essentially does just that; and really, this or something like it is the only sane approach. Such a solution is necessarily implementation-defined, non-deterministic in a parallelized compiler, and unstable in an incremental compiler.

If @typeId exists, its return value will not be stable across compiler implementations, incremental updates, modules shared between different projects, or rebuilds of the same code. Overcoming the technical constraints which lead to this design would require either crossing some hard lines regarding language design (namely, making Zig effectively impossible to compile in parallel or incrementally), or an unacceptable amount of complexity in the compiler implementation and language specification.

As @Snektron has said, we cannot have nondeterminism at comptime: if the return value of @typeId were comptime-known, one could trivially write code which compiles or fails to compile essentially randomly. Zig cannot be allowed to do this; it's an obviously bad design which would break LSPs, zig build --watch, incremental compilation, and a bunch more stuff.

So, if @typeId exists, its return value has to be either:

However, I would like to pose a question to those who want this proposal. The status-quo solution in the original issue is almost sufficient; the issue is that it doesn't work at comptime. Well, since the type ID should always be an opaque value anyway, why not just use... an actual pointer? For instance:

const std = @import("std");

const TypeId = *const struct {
    _: u8,
};

pub inline fn typeId(comptime T: type) TypeId {
    return &struct {
        comptime {
            _ = T;
        }
        var id: @typeInfo(TypeId).pointer.child = undefined;
    }.id;
}

pub fn main() !void {
    @compileLog(typeId(u8) == typeId(u8)); // comptime-known
    @compileLog(typeId(u8) == typeId(u16)); // and works correctly
    @compileLog(typeId(u16) == typeId(u16));
    passAtRuntime(typeId(u8), typeId(u16));
}

fn passAtRuntime(a: TypeId, b: TypeId) void { // and you can pass them at runtime too
    _ = a;
    _ = b;
}
michaelbartnett commented 1 month ago

Well, since the type ID should always be an opaque value anyway, why not just use... an actual pointer? - @mlugg

I've been using this for a few years now. It feels like an unstable hack, but for platforms I care about, this is ultimately fine for my use cases so long as it keeps working and isn't later regressed to make the compiler go faster or something. The appeal of a @typeId bulitin is the guarantee that it will do what it is supposed to do within the defined bounds (which, as you described: not stable between builds, lazy comptime value that's equality-comparable only until it's resolved for runtime, totally does the job).


Reposting edited points from discord for posterity:

That typeId implementation (and variants on it) works consistently now, but did not work on 0.11 and earlier (presumably is working now thanks to InternPool?):

https://godbolt.org/z/jTvW5ers5

It was possible to work around it in 0.11 by guaranteeing that you always called the typeId function in a comptime context via a wrapper.

So this is less error prone than it was before (presumably thanks to InternPool?), but consider the following:

  1. Genuine question: which would be more preferable for Zig to guarantee, that this pointer trick works, or a @typeId that works within those identified constraints (lazy comptime value that can only be compared for equality at comptime).

  2. Everyone rolling their own one of these in userland immediately cuts off an avenue of interoperability between libraries--having a standard way to generate a type identifier. If that typeId implementation was in std that would resolve this point. This is

  3. It would be terribly convenient if type IDs were non-exhaustive enums such that they could be smaller than pointers (I'd personally like 32 bit IDs) in addition to being compared for equality at both runtime and comptime. Additionally (pointed out by random internet person), if there were some architecture with very small addresses (e.g. 8 bit) then you'd want a wider-than-pointer type ID.

And to further elaborate on why this is a tool worth having:

Wanting some IDs to associate with types is a commonly helpful thing in large applications where part of the work you do is adding many many new types during development (think 100+ programmers all adding and extending new types every few months). Having some sort of type identifier to associate different type-related functions/data with some reified reflection data is helpful: serializers, editor drawers, subtypes of objects in undo/redo stacks. All my examples are game engine & tooling focused since that's my domain, but surely other major desktop applications in which users author complex data have similar problems and solutions. There is a lot of overlap there.

Prior art in large games often revolves around building large preprocessing tools with macro tricks or IDLs or libclang-based tooling (e.g. Unreal Header Tool or clReflect). Having used a variant of that userland typeId in a Zig project combined with the language-native reflection functionality for the past few years, I've yet to see the need to introduce a preprocessing step for this purpose. This is a huge benefit of the language.

Yes, this is like having an enum where you write all the types you care about, except the point is to not have to maintain an enum with 500-1000+ fields that's constantly changing. I've had enough problems with hand-maintained type id enums with fewer than 100 fields.

ikskuh commented 1 month ago

Yes, this is like having an enum where you write all the types you care about, except the point is to not have to maintain an enum with 500-1000+ fields that's constantly changing. I've had enough problems with hand-maintained type id enums with fewer than 100 fields.

This won't work as soon as you want to have no generics and using foreign packages that need to handle TypeIds as you have no way to properly pass the enumeration into the package.

const TypeId = *const struct {
    _: u8,
};

pub inline fn typeId(comptime T: type) TypeId {
    return &struct {
        comptime {
            _ = T;
        }
        var id: @typeInfo(TypeId).pointer.child = undefined;
    }.id;
}

@mlugg the problem with this solution is that it takes up space in .bss. Considering targets like AVRs which have 512 or 1024 bytes of RAM, each typeId would take up 1‰ of available RAM.

If we chose to put a implementation into stdlib (which is what i'd say), we should modify it such that we store the id variable into it's own linksection(".bss.typeid") or similar, so we can reroute/store all type ids inside a NOLOAD section outside of relevant memories for targets like AVR, and we gain the nice feature that all TypeIds are contiguous without any gaps.