Closed SpexGuy closed 7 months ago
st at runtime. That issue suggests representing this difference using an actual pointer type, but that has some strange properties. Normally, pointers to comptime-only types are themselves comptime-only types. But not in this case.
I don't see why we shouldn't allow that. I think its much nicer than a new fnptr
keyword.
What does it mean to align a pointer to a function? Or to make it const vs non-const? What about volatile or allowzero? A function pointer is not a pointer by any metric except binary representation, and on some platforms not even that.
What does it mean to align a pointer to a function?
uh, it means the body of the function i.e. the machine code, is located at an address that has said alignment.
What about volatile or allowzero?
same as for any other pointer
volatile
meaning that the code at that address may change and not to optimize code that touches it into less or more "touches" than explicitly in the code. I can't imagine anyone would use this often, but perhaps might be useful in a JIT compilation context?
allowzero
would mean that the machine code is allowed to be located at address 0.
That's not what it means. Function pointers in the 1717 proposal are the address of a function label, which is an abstract comptime-only entity. Writing to one of these pointers does not write the function code, it swaps out the underlying comptime-only function label. This is precisely why it's misleading.
Function pointers in the 1717 proposal are the address of a function label
That wasn't my understanding: my understanding is that a function label is a 'label' for the machine code, as an address isn't known at comptime (it's only known at link-time). At link-time the label finally gets a 'real' address, and any static function pointers are updated to point at this link-time location (for dynamic executables/libraries this linking happens at load time).
I'm talking about this specific part of the proposal which aims to solve this issue and is an accepted part of 1717.
For the language to make sense on the whole, this code needs to work:
const a = fn() void { print("a"); };
const b = fn() void { print("b"); };
test "what is a fn ptr" {
const fptr = comptime blk: {
var x = a;
var p = &x;
p.* = b;
break :blk p;
};
fptr();
}
But this means that the address of a function is not the address of some machine code. Instead it's the address of a comptime-only object. The fact that such a thing can exist at runtime is super weird and leads to misunderstandings about what it actually is, like yours.
This proposal is an attempt to fix that, by separating function pointers and function definitions into separate categories, so that a pointer to a comptime function label object can be differentiated from a pointer to machine code.
var x = a;
In the linked proposal this is a compile error:
var baz = foo; // compile error: cannot have a function label at runtime. Use `&foo` to get a function pointer
That's at top level scope, which makes it a runtime variable. Runtime variables cannot contain comptime-only values, so that's a compile error. In my example above, I am using a comptime var, which works just fine.
@SpexGuy, Is there a reason not to make function pointers explicit, instead of introducing this whole labels thing? A function would be an entity consisting of metadata and physical code, pretty much as it is now. Assigning it to a variable would merely create an alias. But taking an explicit reference would produce a raw pointer that can be passed around, stored in data structures and invoked at your own risk. Some examples:
const foo = fn(x: i32) i32 { return 2*x; }
const a = foo; // function alias, preserves all metadata
var b = foo; // probably not allowed
const p = &foo; // constant function pointer
var q = &foo; // variable function pointer
export const x = foo; // exported function
export const y = &foo; // exported function pointer
Is there a reason not to make function pointers explicit, instead of introducing this whole labels thing?
The problem is that this model is different from the rest of the language, and doesn't make sense with comptime. Let me go through your example and annotate it to show the problem.
const foo = fn(x: i32) i32 { return 2*x; }
// has type `fn(i32) i32`
const a = foo; // function alias, preserves all metadata
// If you can do this, functions are not pinned. This doesn't create a new function. Therefore fn types are comptime references, not values.
// You can make a comptime var of this, take a pointer to that var, and mutate the value through the pointer.
// So a pointer to this is a pointer to a comptime reference to the function data.
var b = foo; // probably not allowed
// This needs to be disallowed for a specific reason. In this case, it's disallowed because functions are comptime-only
// types, and therefore you cannot have a runtime var of this type. But you can have a comptime var of this type.
// To disallow that is unreasonable and has no precedent in the language. It would be thoroughly unexpected and make no sense.
const p = &foo; // constant function pointer
// This has the type *const fn(i32) i32. The const comes from the fact that `foo` is stored in constant memory.
var q = &foo; // variable function pointer
// Same type, but this is allowed. This means you can have a runtime pointer to comptime-only data.
// So `q.*` is a compile error because the result is comptime-only but `q` is not comptime-known.
// Normally pointers to comptime-only types are themselves comptime-only, so this breaks all those rules as well.
// So now the conditions for is-comptime-only has these weird exceptions:
// comptime only primitive -> true
// fn -> true
// pointer -> payload is fn -> fn is not generic -> false
// pointer -> payload is comptime only -> true
// aggregate -> any field is comptime only -> true
// else -> false
// Whereas before it was very simple:
// comptime only primitive -> true
// pointer -> payload is comptime only -> true
// aggregate -> any field is comptime only -> true
// else -> false
// Looking at this type, you would think that the `const` here means that the function code is constant.
// But that is a lie, that's not what it means. The `const` here means that the underlying function reference is constant
// and cannot be changed through this pointer. Attributes on this pointer apply to `foo`, which is the label that
// was copied to `a`, not some list of function code.
export const x = foo; // exported function
// This is fine and make sense
export const y = &foo; // exported function pointer
// This is also fine, no problem here. The binary representation of a function pointer
// is the pointer to machine code, even though the attributes apply to the underlying comptime object.
The other important problem this solves is parameter names being stored in decls. That absolutely has to go, it's not where that information belongs.
@SpexGuy I don't see how functions behaving a bit differently is a problem. The tradeoff is either a) increase the number of rules for comptime-only-ness from four to five, the new rule concerning a core part of the language that it is absolutely reasonable to expect a user to take the time to understand or b) add an entirely new feature, keyword and builtin to the language, because we don't respect functions enough to give them the space to work how they should with existing features. It's like natural language: the most often used verbs are the irregular ones, because it's reasonable to expect a speaker to learn all the ins and outs because they use them so often. As far as I'm concerned, most of this proposal is ugly and superfluous.
An alternative solution would be:
const
, and pointers to such must be immutable; if you need mutability, you must take a pointer and reassign the whole pointer.This has all the necessary functionality at comptime, and does not require any modification to work at rumtime; the only downside is a few more characters to type, and the upside is no additional language features.
Also, that extern
solution is ugly and repetitive and repetitive. With the above solution, it could be much cleaner and consistent with other uses of extern
:
/// n: GLsizei, buffers: *GLuint
pub extern const createBuffers: fn(GLsizei, *GLuint) void;
/// n: GLsizei, buffers: *GLuint
pub extern "kernel32" const GetLastError: fn(GLsizei, *GLuint) void extern;
/// program: GLuint
pub extern const glDeleteProgram: fn(GLuint) void;
Quite literally the only downside of this would be that the type does not name parameters, but this is a non-issue with the use of comments, as seen above.
One thing I do like is the naming scheme of fnptr
. Obviously I hope that that keyword in particular does not make it into the language, but one thing it suggests is the name frameptr
instead of anyframe
, which I think we can all agree would be better.
const a = foo; // function alias If you can do this, functions are not pinned. This doesn't create a new function. Therefore fn types are comptime references, not values.
Yes, function literals are not "values" in the same sense as integers or structs, since they cannot be copied, inspected or modified. Function literals are only touched directly by the compiler. The programmer only gets a handle. You could call this handle a "function lablel", but there's no real need to officially call it anything. It's an implementation detail. The programmer only needs to know that assigning a function to a variable creates an alias and not a bitwise copy. This is different from how assignment works for other types, but it shouldn't really surprise anybody, since it is the only reasonable behavior.
You can make a comptime var of this, take a pointer to that var, and mutate the value through the pointer. So a pointer to this is a pointer to a comptime reference to the function data.
See below.
var b = foo; // probably not allowed This needs to be disallowed for a specific reason. In this case, it's disallowed because functions are comptime-only types, and therefore you cannot have a runtime var of this type. But you can have a comptime var of this type. To disallow that is unreasonable and has no precedent in the language. It would be thoroughly unexpected and make no sense.
I actually don't see any big problems with allowing function vars. Since we already agree that functions are only handles (however we call them), function vars would be the effectively function references (i.e. function pointers that are fully typechecked and do not support pointer arithmetic). These could be usable both at runtime and comptime.
const p = &foo; // constant function pointer This has the type *const fn(i32) i32. The const comes from the fact that
foo
is stored in constant memory. var q = &foo; // variable function pointer Same type, but this is allowed. This means you can have a runtime pointer to comptime-only data.
The &
operator is another place where functions behave differently from "real" values. When you write &fun
you don't get a pointer to a function handle, effectively producing some kind of weird runtime-comptime double indirection. The operator is special-cased to produce a physical function pointer in the C sense.
// So
q.*
is a compile error because the result is comptime-only butq
is not comptime-known.
I think it's reasonable to disallow dereferencing function pointers, for much the same reasons that you can't dereference a void pointer in C.
// Normally pointers to comptime-only types are themselves comptime-only, so this breaks all those rules as well. [...]
Yes, it's an exception. But it's rooted in a real underlying difference, so why not? I think that jumping through hoops to create consistency for consistency's sake is the wrong choice here.
The other important problem this solves is parameter names being stored in decls. That absolutely has to go, it's not where that information belongs.
Parameter names can be part of the function literal, since they don't belong into the type. But I'm out of my depth here implementation-wise.
The proposal feels very much like constant lua tables, but with more precise information.
@SpexGuy Could you elaborate shortly, what the Function Definition Types should and could be used for? I think they would be part of Typed ZIR, but I am not exactly sure for example why file + line info, function name are necessary. It would be great to have ways to match TZIR to AST for static analysis (ie for ZLS), but it might bloat TZIR and leads to slower compile times. Or is some of the stuff only suggested to be included in debug builds?
Yes, function literals are not "values" in the same sense as integers or structs, since they cannot be copied, inspected or modified.
In lua everything is a table that you can modify and metaprogramming could use the same functionality (copy + change stuff would be safer). If that is a smart idea, performant or efficiently to implement in the compiler, would be the other question.
People seem to be objecting a lot to the new keyword, but that's not actually necessary to this proposal. We could use fn
instead and everything is still unambiguous. It's just really weird because it's involved in two separate concepts. The same is true in the current language, we just ignore it. Personally I feel like function types should have a different keyword than function literals, but even without that this proposal can stand.
Similarly, this proposal introduces no new functionality over 1717, just a different syntax to bring it about. 1717 still has conversion from functions to function pointers and back, runtime function pointers, and comptime functions. So the behavior here is not more complex, this flavor just allows things to behave more like other parts of the language, which I think is highly valuable.
add an entirely new feature, keyword and builtin to the language, because we don't respect functions enough to give them the space to work how they should with existing features.
You have it exactly backwards. The solution in 1717 either doesn't work with comptime var or behaves unexpectedly with pointer modifiers. This proposal is the version that gives functions the space to work how they should with existing features.
consistency for consistency's sake is the wrong choice here
Consistency is extremely valuable. It allows a language to be intuited. People can guess how a feature should work, and then it does work that way. Zig is a highly consistent language, much more than other languages. This is a large part of what makes it feel simple. Arrays are value types because it's more consistent, even though C's approach is more pragmatic. Comptime loops and runtime loops use the same syntax because it's consistent. Errors and error unions behave like normal values because it's consistent. I think there's a very high cost to breaking consistency, and I would like to avoid that in a core part of the language like this. Not being able to make a comptime var of a specific type is unprecedented in the language, and raises all kinds of other questions. Can you embed one of these values in a mutable comptime-only struct instance? What about a comptime field? If not, why not? If so, can you take a pointer to it? What happens if you mutate through that pointer? Are all function pointers const? If not, how does one create a mutable function pointer? What does it mean to dereference it? Why can I create a pointer to a function but not a pointer to a comptime_int, even though both are comptime only? There's a huge amount of complexity involved in breaking consistency here, in the form of all of these questions. The answers are irrelevant, the problem is that these questions exist at all. Every newcomer to the language will ask them. They will be a stumbling point forever. You cannot look to any other feature in the language to help answer these questions, because this is unlike any other part of the language. I would not so easily introduce that sort of complexity.
I am not exactly sure for example why file + line info, function name are necessary.
They aren't necessary, I was just giving examples of information that makes sense in a function declaration type but not in a function pointer type. I don't know that TZIR representation is really relevant here.
Could you elaborate shortly, what the Function Definition Types should and could be used for?
They exist to make function pointers behave like pointers to instruction data, and functions behave like comptime objects. This is what people intuitively expect from these types. Function pointers don't need to be a special case in terms of comptime behavior. We can have their behavior be intuitive from other parts of the language. Making this distinction allows that.
intuitive
There's that word again. Personally, having only generic function types and anti-pinning function values is the most intuitive solution to me; the proposal as written is just bizarre (are generic non-pointer function types a thing? They're not mentioned, but seem to be implied...?). Optimising for intuition always assumes something of the developer which is not universally applicable.
Attempting to have No Exceptions™ to the pointer rules is already a lost cause -- we already break the rules for opaque types (no dereferencing) and variable-width types (pointers cannot be read at runtime). Contorting the developer interface to functions to make them fit the rules for specifically non-opaque, fixed-width data, when functions in reality are neither of those things, I think is ten times more bizarre and crazy than making pointers to them work a bit differently.
@SpexGuy, I don't question the value of consistency in general. But there are cases where things just work differently and trying to force them into the same mold does not help anybody. I think this is just one of those cases. The core issue is that functions really are different from all other values even in languages that advertise first class function support. Unlike integers, strings or structs, which are all mapped in very predictable ways from the literals in the source code to binary values in memory, and can be loaded and stored and modified in reasonable ways, functions are abstract entities that go through a complex compilation and optimization process, often resulting in a totally inscrutable sequence of machine instructions, and finally end up somewhere in read-only memory. After that, they can only be referenced or invoked. It makes no sense -- in a compiled language -- to copy a function, look at what the 4th instruction does, modify it to return an int instead of a float and then copy it into a struct to be called later. All compiled languages with "first class" functions in fact only have function pointers with syntax sugar on top. That's 1.8th class at best if you ask me. Only fully homoiconic languages like picolisp have true first-class function support.
End rant. What I'm saying is const x = 10
and const y = fn(...
are the same thing only in a very superficial sense, however you slice it. You have to treat functions somewhat differently, and the programmer needs to be aware of that anyway, whether it is in the form of special syntax or special rules for ordinary syntax. Now to your specific points:
Not being able to make a comptime var of a specific type is unprecedented in the language, and raises all kinds of other questions.
As mentioned before, I don't see a problem with function (handles) being vars, comptime or otherwise.
Can you embed one of these values in a mutable comptime-only struct instance? What about a comptime field?
Sure.
If so, can you take a pointer to it?
Not a comptime pointer. The &
operation is special-cased to return an ordinary runtime fptr. But why would you want a comptime function pointer specifically, when ordinary functions already behave like references for all practical purposes?
What happens if you mutate through that pointer?
You can't. Footgun eliminated, no?
Are all function pointers const? If not, how does one create a mutable function pointer? What does it mean to dereference it?
Function pointers themselves can be reassigned. The function they point to cannot be modified. Direct dereferencing is not allowed, for the reasons pointed out by @EleanorNB above, in addition to the fact that there's nothing you could realistically do with the "value" of the function.
Why can I create a pointer to a function but not a pointer to a comptime_int, even though both are comptime only?
Because it makes sense and is in fact useful. Whether a useful analogon can be found for comptime_int
is another question.
There's a huge amount of complexity involved in breaking consistency here, in the form of all of these questions. The answers are irrelevant, the problem is that these questions exist at all. Every newcomer to the language will ask them. They will be a stumbling point forever.
The answers are very much relevant, IMO. Many of the questions sound rather hypothetical. The disallowed things are disallowed because it would be an error or undefined behavior to attempt them. And I don't think it will be a problem for learners. When they try to do these things, they will get an error message informing them that they are trying to do something impossible.
I agree with most of what @zzyxyzz is saying, save three points:
After sleeping on it, I realized that I've been talking past @SpexGuy yesterday and missing some important points. There are some constraints to this problem that are not easy to satisfy simultaneously. Here I've tried to document them, since they weren't clearly explained before:
So long as functions only live in stack variables, the language has some leeway to make them "just work" without burdening the programmer with the details. We might even excuse some special behavior here and there. There are some cases, however, where implementation details are forced to a point. For example, you might have a struct that supports dynamic behavior by assigning a function to a field:
object.action = fn() void {}; // very exciting
Since structs are plain old data, we can't dance around the issue here, and have to settle on a binary representation for the function handle. Logic suggests that it must be a function pointer. But then it would be hard to justify it if a function assigned to a local variable wasn't a function pointer as well:
var f = fn() void {}; // variable f holds a function pointer internally
So far so good. But what happens when we do the same at comptime
? The function has not been compiled yet, so its handle cannot possibly be a runtime function pointer. Just about the only thing we can do to get out of this, is to introduce a separate comptime function pointer/handle that represents functions only at comptime. But then consider this incredibly common pattern:
const std = @import("std");
const print = std.debug.print;
Since print
is comptime and therefore holds the special comptime-only function handle rather than a runtime function pointer, print("WTF?\n", .{})
shouldn't actually work. Since that is completely unacceptable, we need another rule saying that comptime function handles automatically coerce to runtime functions. This way print
remains a comptime entity at the type level, but the compiler automatically converts it as needed if you try to call it at runtime or assign it to a location that expects a runtime pointer.
But that's not all. There are places where you need to officially acknowledge the distinction between a function and the function pointer that may represent it. One such case is exported functions and function pointers at the ABI level. Another is low-level work with function pointers, where you might need to do pointer arithmetic. This raw function pointer can be neither fn()
nor *fn()
(that would be double indirection if fn()
is already a pointer). So you probably need a separate type for it (fnptr
in this proposal). Functions (comptime and runtime) may coerce to it, but you still need a way to talk about it at the type level.
I think these are the main constraints, but not all. There's also propagation of constness and comptimeness, where to put things like argument names for introspection and metaprogramming, generic functions, and I'm totally out of my depth here, and I should have taken more time to understand the issues involved before going on my commenting crusade yesterday.
I don't know what the right solution is here. This proposal doesn't feel quite right yet. But the issue is far from simple, although it would have been if it weren't for Zig's comptime.
@SpexGuy,
What if we officially gave function literals the type *const fn(i32) void
rather than fn(i32) void
? This would be similar to string literals, which have the type *const [n:0]u8
and are essentially r-value references, which are gracefully handled by the compiler because of the static lifetime. fn
itself would be opaque, so dereferencing a function pointer would be forbidden. You would be allowed to take pointers to the function pointer, however, and mutate and dereference that all you like.
The rules for transferring function pointers from comptime to runtime are not clear to me, but the exact same thing somehow works for literal strings, so maybe we could just apply the same rules to functions as well?
The ABI issue would also have to be handled somehow, but I'd first like to know whether the idea has merit at all.
What @SpexGuy points out, I think, is that there is a difference between
A) the function definition, say as an abstract syntax tree,
analogue: struct (i.e. type) definition
const Foo = struct { humpty : i32; dumpty : u8[]; }
B) the function as the operation it defines on the virtual machine. Two such operations are equivalent if they have the same semantics.
analogue: tuple (i.e. type) definition
const Bar1 = struct{i32, u8[]};
const Bar2 = external struct{i32, u8[]} // with well defined layout
C) a function symbol, aka entry point for a stream of instructions with a well defined ABI (calling convention)
analogue: external struct instance definition.
external foo: const external struct{ humpty: i32; dumpty: u8[]; };
He then points out that in a language like Zig which does computation at compile time you cannot quite sweep these differences under the carpet.
It seems to me that there is a little leeway in the relation between fn(i32: n) int and *const fn(i32) i32 and in particular what the address operator & should return but here is my 2C:
-- fn(int : n) int is the type of a function definition with argument named n (a compile time concept). E.g.
const inc = fn (n : i32) i32 { return n +% 1};
which is equivalent to
const inc : fn (n : i32) i32 = fn (n : i32) i32 { return n +% 1};
-- *const fn(int) int is the type of a specific function as an operation on the abstract machine (also a compile time concept). E.g. const inc1 = &inc; const inc2 = & fn(m: i32) i32 { return m +%2 -%1};
which is equivalent to
const inc1: const fn(i32) i32 = &inc; const inc2: const fn(i32) i32 = &fn(m: i32) i32 { return m +% 2 -% 1};
The compiler may or may not set inc1 == inc2 because the argument name does not change the semantics and because of the rules of (modular) arithmetic say m +% 2 -% 1 == m +% 1 (and in this case, optimisation is likely to find that out).
-- fnptr (i32)i32 (as defined by @SpexGuy ) (or fnentry(i32) i32 or perhaps something like
@FnEntry(fn (i32)(i32), .CallingconvC) or std.FnEntry(fn (i32)(i32), .CallingconvC) ) is the type of a concrete function entry point in an executable. It is a value type that on most (but not all) architectures will be the size of a pointer.
@cInclude{ // int32_t increment(int32_t n){ return ++n}; }
const zig_inc = external inc; const c_inc = external increment;
which is equivalent to
const zig_inc: fnptr (i32) i32 = inc; const c_inc: fnptr(i32) i32 callingconv(.CallingconvC) = increment;
Only fnptr(i32) i32 (or fnentry(i32) i32, or ... ) can be stored in structs and passed to functions at runtime, and for fnptr's with different calling conventions the compiler must generate a little shim function. Two fnptr's are equal if and only if they refer to the same function entry point in an executable.
This should be unnecessary with the function pointer changes in the self-hosted compiler.
Since #1717 is dead now, should this still be open?
Closing this as handled by the function pointer change in stage2.
This proposal solves the problem introduced by #1717, that function pointers are ambiguous with function labels on the ABI boundary. The suggested solution in that issue is to make a distinction between function labels, a comptime-only type which represents an actual function; and function pointers, which may exist at runtime. That issue suggests representing this difference using an actual pointer type, but that has some strange properties. Normally, pointers to comptime-only types are themselves comptime-only types. But not in this case. Additionally, information related to function definitions (parameter names, for example) is currently stored in the type info for Decls, which is kind of a weird place. This proposal suggests an alternate form of this distinction to solve both of these problems.
1. Function Definition Types
Like struct literals, every function literal has a distinct type. The type info for this type includes information about
Function definition types are comptime only, and pointers to these values are also comptime only. Their names are assigned in exactly the same way that struct names are assigned. There is no type literal for a function definition type (in the same way that there are no type literals for frame types).
2. Function Pointer Types
Function pointer types carry only a subset of the information stored in a function definition. They leave out parameter names, function names, and file and line info. They are declared using the new keyword
fnptr
. Function definitions may coerce to compatible function pointer types. Unlike definitions, function pointers may be mutable at runtime if the parameter and return types are not generic. We could decide to disallow generic function pointer types, but this would force the use ofanytype
in many places, which could damage type safety.The result of peer type resolution on multiple function definition types is the compatible function pointer type, if one exists, or a compile error otherwise. Unlike definitions, function pointer types are actively deduplicated by the compiler.
fnptr() void
in two places in a file will generate two references to the same type. A comptime-known function pointer value can be converted back to a definition with the new@fnDef(comptime ptr: fnptr) fndef
builtin.3. Extern Functions
A function literal may be created to reference an extern function by replacing the body with the
extern
keyword. A symbol name may also be specified.The parameter passed to
extern
will be of typestd.builtin.ExternInfo
, which is defined as4. Function Pointer Comparison
Comparing function pointers is done by the following rules: If the two pointers are derived from the same function literal, comparison must return true. If the program may observe a difference between calls to the two pointers (they have different side effects or return values), comparison must return false. Otherwise comparison may return true or false, but the compiler must be consistent about any given pair. (it may not decide that they are equal in one part of the code but different in another.) Function pointers may compare as equal if and only if their underlying binary representation is equal.
These rules allow the compiler to deduplicate functions which generate identical code, without generating an extra stub for one of them to ensure they compare as distinct. They allow this deduplication to happen after optimization (like dead global removal), as long as the functions are not compared at compile time.
5. A note on coercion
Coercing a function definition to a function pointer always requires writing out (or constructing) the compatible function pointer type. This could be solved with comptime code, but may be common enough that we want a more dedicated solution. Here are the four ways I can see:
std.meta.fnPtr(comptime fnDef: anytype) FnPtr(fnDef) { return fnDef; }
@fnDef
above).var ptr: fnptr = some_fndef;
to infer the needed function pointer type.