Proposal: Pragmas - Githubissues

ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

https://ziglang.org

MIT License

33.28k stars 2.42k forks source link

Proposal: Pragmas #5239

Closed ghost closed 3 years ago

ghost commented 4 years ago

In the Nim sense. This'll be especially helpful when Zag comes in. Loosely inspired by #4285, but with the problems removed.

(N.B.: I am bad at commitment. Assume this is a living document.)

A pragma is a special kind of tuple, written with at-braces: @{} (combination of "hey, compiler, look over here" and "there's a bunch of stuff in here"). Each field has special significance to the compiler -- fields cannot be user-defined. Pragmas express properties of language constructs -- wherever there's a use case for a keyword that only makes sense within the context of a specific construct, a pragma can be used instead:

// Associated with the function
const deprecatedFunc = @{deprecated("Use other_func instead")} fn () void { ... };
// Pragma placement is flexible
const asyncFunc = fn @{async} () void { ... };
// Fields that take an argument are comptime-configurable
const alignedFunc = fn (a: i32, b: u32) @{alignStack(stack_alignment)} isize { ... };

// Associated with the scope
const maybeColdFunc = fn () void {
    // None pragmus does nothing -- can be used to switch on non-configurable features,
    // but with intentionally obtuse syntax to make users think twice about it
    @{if isCold() {cold} else {none}};
    ...
}

// Associated with the object
const packed = struct @{packed} { ... };
const tagged = @{derive_enum} union { ... };

// Associated with the label
@{export} const exportedFunc = fn () void { ... };
const @{pub} main = fn () void {
    // Associated with the scope
    @{runtimeSafety(true), optimization(.fast)};

    // Associated with the function call
    @{noinline} alignedFunc(17, 31);

    // Associated with the variable
    @{comptime} var comptime_var = 0;

    // Associated with the block
    var x = @{comptime} blk: {
        var a = computePart1(comptime_var);
        break :blk computePart2(a);
    };

    // Associated with the asm block (see #5241)
    asm (.{.x86_64}) @{impure, stack(32)} |"esp", "rax"| void { ... };
};

Thanks to their unique syntax, pragmas can be placed flexibly within declarations, are always optional, and can even be served in pieces (supplying the same pragmus twice to one construct is a compile error). Pragmas can only ever be used as literals -- only those fields that are explicitly exposed are configurable. This way, no more magic is possible than in status quo.

With this feature, we'll be able to extend the language with little to no friction, but arbitrary users will not -- the best of both worlds.

Rocknest commented 4 years ago

Whats wrong with keywords?

ghost commented 4 years ago

Nothing -- but, every time we add a new one, it's a breaking change. This slows down iteration on the language.

Rocknest commented 4 years ago

By the way io_mode is just a declaration that stdlib looks for, its not a keyword nor some compiler magic.

ghost commented 4 years ago

Ok, I'll get rid of that then.

zzyxyzz commented 4 years ago

I like the symmetry of this proposal. The language already has builtin functions, but it's missing builtin properties, which instead pollute the global namespace in a somewhat ad-hoc fashion.

I would argue for a slightly different syntax, though:

@{runtimeSafety=true, optimization=.fast}
@{comptime} // same as @{comptime=true}

zzyxyzz commented 4 years ago

Another example where this could be useful is #5177. I think a pragma/property @{cold} would be clearer than a builtin "function" in this case.

kenaryn commented 4 years ago

You're showing symptoms related to Dlang's increasing complexity with adding much of a visual pollution for still hypothetical benefits. You need to prove that Zig will gain some flexibility in the process; otherwise, you counter-act the third and before-last amendments of Zig zen.

zzyxyzz commented 4 years ago

You need to prove that Zig will gain some flexibility in the process; otherwise, you counter-act the third and before-last amendments of Zig zen.

The way I see it, this proposal does in fact add quite a lot of power and flexibility.

Since Zig aims to be a low-level and all-purpose language, it needs to handle lots and lots of corner-cases that cannot be implemented in user-land without compiler support. Examples: specifying calling conventions, layouts, alignments; making use of machine-specific features, including parallelism; type introspection, etc.

Some of these features might deserve their own keywords or special syntax, but most are too obscure for that. Thus, it is good to have some systematic way to introduce such compiler features without disturbing the main language. Otherwise, every proposal for a fairly rare, but occasionally necessary feature is liable to trigger a lengthy discussion about how it can be integrated into the language without messing up the existing syntax and imposing cognitive overload on programmers. In many cases the conclusion will inevitably be that it cannot be easily done and it's just not worth it. This is, the way I see it, the main reason why Zig has builtins (of which there are already more than 100). It makes the language less reluctant to correctly handle corner cases. This proposal merely extends the concept of a builtin to include not only functions but also properties.

You're showing symptoms related to Dlang's increasing complexity with adding much of a visual pollution for still hypothetical benefits.

Visual noise is a valid concern. E.g., personally I dislike Zig's decision to dispense with top-level declarations in favor of const foo = fn ( ... ) and ilk. Common sense should be exercised in deciding what is common enough to warrant a keyword, and what can tolerate the (moderately) increased verbosity of pragmas. It would be nonsense to write var @{immutable} x = 3 instead of const x = 3. But maybe @{callconv=.Stdcall} is a reasonable alternative to having a dedicated callconv keyword.

ghost commented 4 years ago

@zzyxyzz A struct-based syntax has been discussed before, but I find the increased verbosity too unwieldy to justify (@{.comptime=true} vs @{comptime}, @{.alignStack=8} vs @{alignStack(8)}). A keyword-inspired syntax is more concise, loses no expressive power, is more intuitively obvious ("What's that bloody dot doing there"), and is more easily zigfmtted from status quo.

@kenaryn Read the initial proposal again -- there are some features there that do not currently exist in Zig. The only way to add these currently would be either keywords, which would be a breaking change, or builtins, which several people feel are already bloated enough. This provides a lightweight, flexible option, that can be rolled out easily.

SpexGuy commented 4 years ago

This is, the way I see it, the main reason why Zig has builtins (of which there are already more than 100).

This is a really good point. There are a lot of small but important use cases that must be handled in the compiler but don't deserve their own keyword. All major C/C++ compilers, rust, nim, and even go all have solutions to this problem. Zig's builtins get us most of the way there, but can only handle attributes that occur in a place where an expression makes sense.

C++ gets around this problem in two ways: 1) Most C++ keywords are 'contextual', where they are only reserved in certain locations. But that also means the language isn't context-free in the same way and syntax highlighters can get messed up by it. This proposal sidesteps those problems but has the same effect on implementation, moving keyword identification out of the tokenizer and into the parser. 2) C++ compilers have non-standardized specialized attributes, very similar to this proposal.

Going through the list of supported annotations in clang, most of them are already supported in Zig:

functions
- noreturn (noreturn)
- alignment (align, @alignCast in zig)
- minimum valid size (*[N]T in zig)
- versioning information (doc comments in zig)
- specify callback metadata (bound fns?)
- aliasing a function (const a = func;)
- emit conditional compile errors (@compileError)
- enable_if (comptime checks, conditional compilation)
- controlling inline and tail calls (inline, @call)
- controlling dll exports (@export)
- format strings (comptime)
- controlling start code (zig has none)
- wasm imports (@"")
- writing interrups (callconv(.Naked))
- calling convention (callconv)
- locally disable sanitizer (@setRuntimeSafety)
- nodiscard (always on)
- controlling function segments
- parameter aliasing (noalias/mayalias)
- controlling overloads (never allowed)
- sanitizing single-field-live enums (checked UB)
- deprecation (@compileError + conditional compilation)
- specifying parameter invariants for the optimizer (undefined/assert)
- comptime type hints for varargs (comptime validation code)
- function selection by target architecture (builtin + conditional compilation)
- apple/objc internal use (N/A)
- even fancier templates (N/A)
- specifying C++ exception behavior (N/A)
- recovering an object from std::move (N/A)
variables
- dll import/export (@export)
- specify undefined contents on startup (undefined)
- specifying array size (*[N]T)
- comptime initialization (comptime + const)
- thread local storage (threadlocal)
- silence unused warnings (N/A)
- destructor behavior (N/A)
types
- pointer alignment (align)
- read-only pointer (const)
- enum extensibility (_, enum)
- nullability (?*T, *T)
- inheritance metadata (N/A)
statements
- inline assembly (asm)
- switch/label fallthrough (N/A)

The ones that are currently unsupported are:

functions
- interop with a library compiled to use indirect function call sanitizer
- function convergence and deduplication
- runtime cpu dispatch
- function argument stack alignment (could be handled with align but isn't)
- annotating near calls which can use a shorter instruction sequence on MIPS
- auto-generating interrupt calling conventions on some platforms
- override specific flags at function scope
- -fmicromips
- -fno-builtin
- -mspeculative-load-hardening
- -fstack-protector
- hinting vector size to the autovectorizer
- tell the caller not to save registers
- controlling how debug information is generated for inline functions
- specifying to the optimizer that reordering independent instructions around a function call is ok
- preventing the optimizer from duplicating calls to a function (necessary for e.g. GPU barrier instructions)
- generating nops before/after a function for later patching in a debugger
- special compilers (openCL, openMP, xray, amdgpu)
- resource management metadata (pointer ownership, consumption)
variables
- suppress debug info at variable scope
- tell the optimizer that a pointer will not be stored anywhere persistent
- allow better struct packing (zig may eventually be able to do this automatically with whole-program analysis)
- resolving multiple definitions link errors
types
- write-only pointer
- specially sized pointer types (address spaces may solve this)
- non-dereferenceable pointer types (@OpaqueType arguably does something similar)
- flag enums (zig will support this before 1.0, though it will probably be via packed struct instead of enum)
- change struct layout rules for a specific compiler and version (mostly for MSVC, probably due to bools)
- controlling LTO visibility
statements
- loop optimization hints

Of the ones we don't have, some of these are things that we should probably support for certain platforms. They definitely don't deserve keywords though. So I think this qualifies as a motivating example for why something like this is necessary.

But this proposal raises a really big question: Where would we draw the line between keywords and attributes? Should pub be an attribute? Annotating everything with @{pub} seems a bit verbose. What about var vs const? Would @{const} var x = ... make sense? How about @{usingnamespace(@import("foo"))}? If the answer is 'use @{ for the less common cases', then that opens a huge debate about what is more or less common. Many structs I write are extern because I do a lot of work with C libraries. Someone else might use mostly packed structs because they care much more about data size than field access performance. Another user might use exclusively tagged unions with generated tags. There isn't a clear line. I think I'm leaning towards only using these attributes for very niche cases, and leaving core language features like packed, comptime, etc as-is.

It's also really important that these properties can be controlled by comptime-known conditions. A lot of these settings may change based on build options or for different platforms. This is the sort of thing that you would usually control with #ifdefs and macros in C/C++, but in Zig all of that is handled with conditional compilation of expressions. Since these attributes aren't themselves expressions, they would need to have an expression attached that generates a value, and one of the values it can generate must signal that the attribute should be ignored and should use the default value. So e.g. @{cold} would have to either be invalid or shorthand for @{cold=true}, so that you could specify @{cold=figureOutIfCold()}. Or we could look for a way to make them expressions, and use normal builtins. With @cold it's easy, the current definition (behaves like unreachable) works fine. But for something like function deduplication, it's much harder. There might be some way to do this though. @export is one example of a builtin that doesn't quite make sense as an expression but is anyway.

and can even be served in pieces

This feels unnecessary to me. Zig doesn't have macros, so you won't end up accidentally putting two of these together. Anything that goes in an expression context should remain a builtin expression. Those don't pollute the namespace so there's no reason to change it. Existing builtins that modify their enclosing scope should be changed to be attributes placed either before the open brace of the scope or as the first token after it. (only one of these should be allowed, I'm just indecisive about which).

Pragmas

(sorry, nitpick) Pragma doesn't actually really mean anything, aside from being used in C++. Apparently it's short for pragmatic information, which I guess makes sense kind of, but arguably that's what 100% of the code should be. I'd go for something like 'Attributes' instead.

phase commented 4 years ago

@zzyxyzz

Some of these features might deserve their own keywords or special syntax, but most are too obscure for that. Thus, it is good to have some systematic way to introduce such compiler features without disturbing the main language.

@EleanorNB

This provides a lightweight, flexible option, that can be rolled out easily.

I think I'm failing to see how this doesn't disturb the language, nor how this syntax is any more powerful / concise than what is currently in place. I can see how this "unifies" some keywords and builtins in a sense, but is that entirely needed?

My main issue with this proposal is that it's adding symbols in places that it doesn't really need to, like @{evalBranchQuota(2048)}; This is harder to write and harder to read compared to the current model.

With this feature, we'll be able to extend the language with little to no friction

I don't really see it? You're doing the exact same thing as before, except this time it's within curly braces.

If your stance is that the language is being bloated by keywords and builtins, I don't think this is solving that issue. It's merely taking what we have and changing the syntax to something that is less clear. And along with this system, we'd also need to keep keywords (for obvious reasons) and I'd assume we'd need builtins still for the few compiler functions that don't fit within this model.

ghost commented 4 years ago

The difference is previously language-level changes can now be done without potentially breaking existing code or increasing syntax complexity because they're in their own namespace. The point isn't to "unify" things, it's to isolate them -- to decouple compiler magic from userspace.

I'm not saying that all keywords and builtins should become pragmas. I like keywords and builtins. Keywords as declarations are good. Builtins as functions are good. Just some keywords don't deserve to be globally defined (async var? packed fn?), and some builtins don't make sense as functions (set eval branch quota?! Set align stack?! Why?! There's no control flow here, even at comptime!). Current solutions are either shoehorns or overly destructive. This provides a better model.

Also, @SpexGuy, comptime configuration can still be done with e.g. @{callconv(deriveCallconv())} or @{runtimeSafety(safety_on)}. I'm not sure I see a use case for non-configurable attributes (cold, comptime, packed etc.) to be adjustable like this, since these tend to be tied pretty tightly to the body or the surrounding context, and changing multiple places is necessary anyway.

SpexGuy commented 4 years ago

Just some keywords don't deserve to be globally defined (async var? packed fn?)

I see your point here. async, for example, is a highly contextual keyword. It only makes sense in the specific case of coming right before a function, and pretty much nowhere else. callconv is similar. But async is not some aside that you attach to a function on a whim, it's a massive change to the semantics of the function and a core feature of the language. Having a variable (especially a type) named async would be massively confusing and should absolutely be disallowed.

I think the actual heuristic I'm following is this: Keywords and symbols are allowed to affect language semantics. Attributes can affect codegen but cannot modify semantics.

Looking over the entire list of things that you can tell clang that you can't currently specify in Zig, not a single one of them will change whether or not a program compiles (except arguably runtime cpu feature dispatch, but that one would need a major facelift anyway in order to make it into zig).

Under this rule, pretty much everything that's currently a keyword should remain a keyword. Runtime safety settings, float modes, and cold paths would change to attributes (though the execution definition of @cold() is actually pretty nice, it could stay a builtin). packed changes the type of pointers to members in breaking ways, so it's a keyword. extern structs aren't allowed to contain bare structs, so it's a keyword. Functions with callconv(.Naked) aren't allowed to be called directly, and calling convention is viral in the type system, so it's a keyword. volatile is viral in the type system, so it's a keyword. align is viral in the type system, so it's a keyword. These are the core building blocks of the language, they aren't going away, and if we do a good job they won't need to be expanded after 1.0.

One of the amazing things about Zig to me is how it's implemented this huge set of compiler-specific attributes using a much smaller and simpler set of keywords and expression builtins alongside comptime execution. Making it easier to add keywords is great if your goal is to add more keywords, but the thing I love about Zig is that it's been forced not to do that. It's distilled a large number of concepts into a small number of primitives that mesh together in ways that are numerous but still predictable.

Avoiding introducing new keywords is aligned with the goal of keeping the language semantics simple. This restricted definition of attributes would allow for specialized code generation hints when necessary, without enabling lots of new keywords that make the language (the concepts, not the syntax) more complicated. The goal as I understand it is for Zig's semantics to be pretty much final by 1.0. Any changes after that are necessarily breaking, and should be very rare. This syntax would allow us to introduce niche compiler hints in minor revisions without breaking code, and would allow the parser to be forwards-compatible with those changes. But overall, Zig isn't going to be an evolving language like C++ or rust. It's going to be done and mostly static and stable, like C. If I'm wrong about this, please let me know.

to decouple compiler magic from userspace

This is a slippery slope. At some point, the entire language is "compiler magic". Why does fn (a: u32) mean a function receiving an int? Compiler magic! Why does comptime foo(); mean to run foo at comptime? Compiler magic! This goal doesn't provide a clear rule for what is and isn't "magic".

and some builtins don't make sense as functions (set eval branch quota?! Set align stack?! Why?! There's no control flow here, even at comptime!)

@setEvalBranchQuota is actually an instruction that gets executed at comptime and sets the branch quota. In this example, funcA compiles but funcB doesn't:

export fn funcA() void {
    comptime {
        @setEvalBranchQuota(1001);
        var i = 0;
        while (i < 1001) : (i += 1) {}
    }
}

export fn funcB() void {
    comptime {
        var i = 0;
        while (i < 1001) : (i += 1) {}
        @setEvalBranchQuota(1001);
    }
}

But the others I agree with. Things like @setAlignStack and @setFloatMode should be attributes on their parent scope, since they can have an effect on code above their call site.

ghost commented 4 years ago

That's pretty much the opposite of the heuristic I'm using. To me, modifying semantics is the whole point of attributes. That's why I proposed a rigid tuple syntax as opposed to a more flexible struct syntax -- so that, like keywords, users can't go insane with remote configuration. When you're declaring a thing, as opposed to a property of a thing, or you're defining unique syntax, or you're affecting types, or you're doing something unique, that's when you use a keyword. Basically, if the answer to the question "What's this?" is unique (not "it's an attribute" or "it's a compiler hint"), it's a use case for a keyword. I wouldn't mind keywords for the other things as well, but you need to know all of them to know which names to avoid in your own code. That goes against point 9 of the Zen, and it's the reason builtins have @'s on them.

As for compiler magic -- yes, every language is made of compiler magic. One of the main points of Zig, though, is the strict separation between language and userspace -- you can't define a function that looks like a builtin, and there are no builtins that look like functions you can define; you can't define the format of a declaration, and the language can't tell you what to declare. Keywords violate this principle by using the same namespace as user definitions, and of course there's a place for them, but I don't think we should be so flippant about their presence. You could just as easily argue that keywords are a slippery slope -- want a feature? Use a keyword! Have keywords for everything! Memorise 50 pages of keywords to know what you can't declare! (See how easy it is to misrepresent a position?)

Also, implementing new attributes won't change the semantics of existing code. The only way they can possibly break things that worked before is if they reserve a name that was being used, which this proposal makes impossible. And it's awfully bold of you to assume we'll get everything right the first time, or that nothing will change in computing for the entire lifespan of Zig. (Or are you suggesting that things shouldn't change because we have them right already, or that Zig should be limited to particular domains should the field expand? There's really no charitable way to interpret what you said.)

I was not aware of @setEvalBranchQuota. I'll remove that.

ghost commented 4 years ago

Taking cues from #5177, I've added a way for all fields to be configurable, but not so easy that users will do it without thinking.

SpexGuy commented 4 years ago

I'm sorry if I've misrepresented your position. That wasn't my intent. I'm having a hard time understanding exactly how to codify your proposed rule, and I was trying to show that the rule of "compiler magic should use @{}" is insufficient. The claim that introducing a new concept will make the language simpler is a strong one. If there are going to be two ways of writing common keywords, there needs to be a clear and intuitive rule that people can use to determine which way should be used for which keywords. You've clarified the rule and that helps, but it's still very unclear to me. As I understand it, pub is a property of a thing. it's not a syntax (well, it is, but everything is so I don't really understand what this part of the rule means), it doesn't affect types, and it doesn't do something (unique or otherwise). The answer to "what's this?" is "it's an attribute that means this identifier is visible outside of this file and will be imported by usingnamespace". So pub should be a pragma under your clarified rule. Yet it isn't in your example. So there's more to the rule.

Keywords violate this principle by using the same namespace as user definitions

This is a fair point, and if the proposal was to unilaterally remove all reserved words or to put some token on either all reserved words or all variable names I would understand how this simplifies the language. But the fact that it keeps some keywords and removes others in a way that I find hard to predict is why I'm not sold on it.

I believe that what follows is the complete list of reserved words in Zig. Some have multiple uses and appear multiple times in the list. Maybe picking exactly which ones would be changed to pragmas (and why/why not) and specifying exactly where pragmas are allowed to appear will help us to come upon clearer rules, and will make for a more concrete proposal.

Type Declaration enum, error, fn, struct, test, union

Builtin Identifiers type, anytype, anyerror, anyframe, comptime_int, comptime_float, bool, void, noreturn, c_short, c_ushort, c_int, c_uint, c_long, c_ulong, c_longlong, c_ulonglong, c_void, c_longdouble, u<numbers>, i<numbers>, f16, f32, f64, f128 false, null, true, undefined

Modifiers Identifiers: const, var, comptime, threadlocal Values: align Types: packed, extern, enum Pointers: allowzero, noalias, align, const, volatile More Pointers: ?, [*], *, *[N], [*:x] -- not keywords but could have been and are modifiers Functions: inline, callconv, noasync, noinline Library: export, extern, linksection Module: pub, usingnamespace

Control Flow Branch: break, continue, else, for, if, return, switch, try, while Operators: and, or, catch, orelse Async: await, resume, suspend Defer: defer, errdefer Bake: comptime Invariant: unreachable Asm: asm, volatile

I've added a way for all fields to be configurable

This syntax leaves a lot of questions. Does @{} create a comptime block or expression scope? Or is this use of if/else a fundamentally different language construct than the normal if/else? Could I use catch or orelse in this scope? Why is the second set of braces required around cold, but not an at sign? How do commas interact with if/else inside @{}? Why use {null} instead of {}? I'd appreciate a more thorough description of how this construct works.

Zig isn't going to be an evolving language like C++ or rust. It's going to be done and mostly static and stable, like C.

There's really no charitable way to interpret what you said.

That's rather harsh. I had some motivation for saying what I did, and I felt that it was a good idea to do so. I understand if you disagree, and I'm not certain that this is the official stance of the Zig project, but surely you must see that there are benefits to stability, especially in something like a language. It's the whole reason Zig builds static binaries by default instead of linking DLLs. This isn't something I said off the cuff or without significant thought and consideration of the trade-offs involved. I'll go into some more detail about it.

C is mostly static and stable, but not entirely. It's changed a little bit in the last 20 years, but not much. C11 introduced alignment, atomics, noreturn, and a few other things. C18 managed to avoid making any semantic or syntax changes.

I really believe that it's possible for a language's semantics to be stable in the long term, as long as they are sufficiently complete at launch. It may need to add some builtins or hints as technology evolves, but it would take a major technological revolution to require changing the semantics of the language. If new assembly operations are introduced, or new compiler optimizations, Zig will be able to handle them with builtins or hints that don't change semantics. Programming methodologies may evolve, but Zig probably isn't going to adapt to meet them. Zig's simplicity is derived from it being targeted towards a certain style of programming, and I don't think that's going to change. If a new programming paradigm is introduced that sweeps the world but is inconvenient to use in Zig, then people who want to program that way may have to switch to a different language. This is consistent with the design decisions we've made on operator overloading, inheritance, and encapsulation. If properties of hardware change so dramatically that Zig's semantics aren't valid, then it will invalidate the underlying assumptions of the Zig language and people should use a more specialized language for that hardware. If we find that there is some major flaw in Zig's semantics that must be addressed, then that will be fixed in a major breaking revision of the language. But we have a rollover plan if this has to happen. Keywords are allowed to change in this case, and that isn't a problem.

The bottom line is, the amount of control you have over the output machine code in Zig is on par with C. Assuming the compiler supports the necessary targets and has the necessary compiler hints and builtins, Zig will be unusable for a computing profession on the same day as C99 is. Zig avoids the long tail that C++ and Rust are travelling down of fixing all the little corner cases of ownership and slowly improving templates, by not having ownership semantics in the language and using the same semantics for templates and runtime. Because Zig avoids all that complexity, it should be possible to ensure that Zig's semantic set is sufficiently complete.

it's awfully bold of you to assume we'll get everything right the first time

Zig 1.0.0 isn't the first try. It's not what we roll over to after 0.9.0, it doesn't have a set deadline, and we can postpone it until it's ready. I think we can get everything (or close to everything) right at 1.0 because we decide exactly when 1.0 happens. I think it's possible for a simple language like Zig to ship with a complete semantic set that doesn't require tweaks, except to support new hardware or optimizer hints. 1.0 is the label we put on the release when we are satisfied that the language semantics are complete for all current hardware and all current use cases, and it's time to stop changing the language and start optimizing the compiler. The current release cycle allows us to try things and review ideas and push the boundaries of the semantics to figure out where they are inadequate. We may make mistakes with 1.0, and we have a fallback plan if we do, but if there are major problems with the syntax or major improvements to be made then the mistake was in failing to do our due diligence before labeling the release 1.0.

ghost commented 4 years ago

Tbh, my idea was for the rule to be a bit flexible. Ergonomics is a top priority as well, after all, just after simplicity. However, I can now see the confusion that an inconsistent rule could cause, so I've amended the proposal.

The proposal is not to remove all keywords, that would be impractical and overly verbose. Just to remove as many as makes sense -- to leave only those such that the language can be considered "complete", in some sense. Admittedly, there is still some intuition involved, but that's inevitable in design (why are catch and orelse keywords rather than operators? They just are).

Perhaps a better rule would be: if a keyword can be removed and leave the program in a consistent state (strictly regarding types and syntax), and preserve number and order of statements, it should be a pragma. So of your list, the following would become pragmas:

Type Declaration (All of these would remain keywords -- they dictate the syntax that comes next)

Builtin Identifiers (All of these too -- they only appear in required places)

Modifiers

Identifiers: comptime, threadlocal
Values: align
Types: packed, extern, enum (could be replaced with derive_enum for clarity)
Pointers: noalias, align, const, volatile (all attributes)
More Pointers: (symbols work fine and are unambiguous)
Functions: inline, callconv
Library: export, extern, linksection
Module: pub (usingnamespace can itself have modifiers)

Control Flow

Branch: (all of these either dictate syntax or are themselves instructions, so all of them stay)
Operators: (all of these too)
Async: (and these)
Defer: (these as well)
Bake: comptime (debatable -- I could definitely see this going the other way)
Invariant: (unreachable stays -- it's a thing, not a property)
Asm: volatile (I've proposed replacing this with impure (#5241) -- I suspect volatile was reused to avoid introducing another keyword, even though it would make more sense)

Everything else, I agree, we should be able to get that absolutely rock-solid and stable by 1.0.0. We're just about there already, even.

Under this proposal, @{} is a comptime tuple -- syntactically, a comma-separated list of comptime values of type (pragmus or key or some other name, to be decided). You can use whatever you like in there, and you don't need another @{} because you're already in a pragma field and just need the value. You still need braces around the if branches because that's just how if works. I included the null pragmus because the if expression still needs to evaluate to a value of type (pragmus|key|whatever), as it is in the place of a pragma field -- void is invalid here. You do bring up a good point about confusion, so maybe overloading null isn't the best idea -- I've amended the proposal to use none instead.

I understand what you were trying to say, and it is definitely a worthwhile goal, and definitely more attainable than it would be in any other language, I just don't think we should count on it. We don't want to have to bump the major version every time there's a use case for another keyword -- that's how valid features get delayed until enough other deficiencies are accumulated to justify another version, or if we don't go that route, how we rapidly reach version 50.0.0. Yes, maybe that's catastrophising, but the only unrecoverable mistake in software is failure to plan for the future.

(Also, when I say "new features", I don't mean new language features. I agree it would be against our principles to chase after every shiny new methodology, and in fact the rule proposed above for pragma inclusion would not allow that. I mean new machine features. We're supposed to be a language for the next 50 years, and as long as new machines are built according to von Neumann architecture, our assumptions will be valid -- but new optimisations and new architecture details could emerge in that time, and if we can adapt to them, we should. Maybe our biggest killer feature over C, the thing we can do that no one else has been able to yet, is staying power. Programs written today could be compiled for machines built 200 years from now. Aiming high? Sure. So, in the words of our benevolent dictator, let's channel our "I'll fucking show them" energy.)

Also, yes, I've been harsh. Sorry, my biggest trigger is not being heard, and having my words misinterpreted can set that off, even when the fault is mine for not communicating clearly. I'm working on that.

ghost commented 4 years ago

I'm in favor of this, and I think it's indeed a better version of #4285 just as claimed. The syntax looks cleaner while being just as expressive.

Pros:

Logical. (Pragmas are always used to configure something else)
- Shows intent as per the above. "Now I'm configuring something", vs only "this is a keyword that has an effect on language semantics".
Does not pollute the global namespace
- Scales better than keywords. 100 pragma config members would not pose a problem in the same way 100 keywords would.
Language defined, not user extendible. (On par with keywords here)
Versatile and expressive. The same syntax can be reused everywhere:
- configure functions
- configure struct/enum/union
- configure types
- configure blocks
- configure ..anything
- (so could keywords, but pragmas would scale better, show intent better, and not change the language grammar. See below for the pragma cons)
Could make "flag" type keywords comptime configurable
- struct @{packed(boolExpr)} equivalent to either struct @{packed} // boolExpr true or struct @{} // boolExpr false.

Cons:

Semantic overlap with both normal keywords and builtin "functions"
- The main con as I see it.
Worse ergonomics and readability
- Is the case for the most commonly used syntax. (E.g imagineu32 @{array(4)} or var @{const}), while struct @{packed} isn't really any worse than packed struct imo.

While these cons are not trivial, I think the pros listed above are valid as well, and there are still features under consideration that will require new keywords or something like pragmas:

function: prefer-comptime flag, cleanup function/strategy (#782)
struct: copyok (#3804)
block: timeconst (#1776)
fields: pinmem (#3803)
type: address space (#653)

Those listed above are concrete proposals. Could easily imagine others:

block: scope limited operator overloading
function: comptime known execution "time" (num of operations)
struct: another struct to embed, type predicate
anytype/var: type predicate
variables: encapsulation

Workarounds for the cons:

Syntax highlighting should be the same for pragma "members" as for keywords.
Accept the overlapping semantics, and use pragmas for the less common cases. Some heuristics:
- If the keyword takes a parameter, make it a pragma (align, callconv)
- If it's dual use (comptime is for example used both to annotate function parameters and create comptime expressions), keep it as a keyword.
- If it's very commonly used (const vs var @{immutable} being a good hypothetical example discussed earlier), then keep it as a keyword.
- The later something was introduced into the language, the more likely it's a niche feature that can be a pragma.
- Features that would naturally belong in a more high level language than zig, should use keywords. Features that are more low-level(memory related, etc) can use pragmas.
- Sadly there is no clean, straight forward heuristic I can think of, but if there aren't so many keywords (which is possible if most corner cases are covered by pragmas), it becomes easier to also know/infer what "isn't a keyword" by exclusion.

Leaving some more hypothetical "workarounds" for discussion:

Tools (editors/IDEs) can work around ergonomics and readability, but making the language expressive and extendible is a language design consideration. Thus, prioritize expressiveness.
Embrace syntax sugar:
- Example: const x = 123; as syntax sugar for var @{immutable} x = 123; where zig allows both.
- Let zig fmt always converts from expanded "pragma form" to the appropriate syntax sugar if it exists.
- Con: Have to introduce another compilation layer that has to preserve error messages
- Pro: Clearly define what is a core language construct vs what are the associated configuration options
Consider that Zig 1.0.0 is not released yet. Zig could use a pragma like syntax for flexibility, and then settle on a final set of keywords in the first official release.

TLDR: Trade-off between improved expressiveness/scaling/flexibility and reduced ergonomics/readability/consistency.

Snektron commented 4 years ago

I would like to propose a slightly different syntax, introduce a special keyword (instead of a sigil as in the original proposal) to designate a pragma (although i prefer the term 'attribute', which seems more in line with the equivalent way to configure such features in C/C++). This keyword would simply accept a struct literal with configuration options for the entity that it is applied on. For example:

fn a() attribute(.{.linksection = .text}) void {
}

const  A = struct attribute(.{.packed = true}) {
};

I believe this has a few improvements over the proposed syntax:

I personally believe that attribute(.{.option = value}) looks cleaner than @{option(value)} or @{option}
A more technical point is that this accepts regular struct literals, so it doesn't require any special syntax.
- This allows more flexibility over attribute features, as one could simply make a function which returns a struct containing the desired values: fn a() attribute(calculateAttributes()) void {
- This also eliminates the short-hand option (where @{option} is equal to @{option = true}), which violates the 'one way to do things' principle.
- Eliminates that syntax looks like function calls, like @user00e00 wrote in their previous comment:
  
  Semantic overlap with both normal keywords and builtin "functions"

I can still see a few drawbacks to this syntax, though:

Configuring attributes is long (which is why the @{option} shorthand was introduced). I believe that the correct way to handle this is separating configuration options into commonly used and more rarely used stuff, which was also concluded by @user00e00:

If it's very commonly used (const vs var @{immutable} being a good hypothetical example discussed earlier), then keep it as a keyword.
Allowing any tuple in attribute(...) directly violates @EleanorNB's original proposal:

Pragmas can only ever be used as literals -- only those fields that are explicitly exposed are configurable. This way, no more magic is possible than in status quo.

I think that whether this is warranted depends on how attributes are handled in different contexts. For example, i think a good application for these types of configuration is architecture-specific options: think stuff like .binding = 123 for a hypothetical GLSL/Spir-V backend. If these are simply ignored on the wrong architecture, it makes sense to disallow calculation. However, i think that something like extern const vertex attribute(.{.binding = 1}): Vec2F; should yield a compile error if compiled for x64, in which case calculation is required to keep code portable over architectures. (Probably a bad example. this feels intuitively right to me, but i struggle to come up with more situations where this is required. Probably useful for some architecture-specific things?).

I also have some generic opinions and remarks for pragmas/attributes:

In my opinion, the set of configurations which can be applied via attributes must have a hard separation from that which can be applied by other means (so no @{mutable}, but have const be the only way to make a variable const). Allowing this would also violate the 'one way to do things' principle, although one could argue that it must be allowed for the sake of allowing computation of attributes like const.
One argument for this kind of configuration on the layout of structs specifically, is that it makes clear that this syntax is invalid
```
const A = struct { ... };
const B = packed A;
```
I've stumbled over details like this before (see #5640 - although i'd say that align warrants a dedicated keyword), and with struct attribute(.{.packed = true}) { this is not a problem anymore.
I would also propose to use an enum for layout, as packed is mutually exclusive from extern. (I'd also argue to rename extern to something like c). This also allows easy addition of more layouts in the future, for example attribute(.{.layout = .glsl_std_450}).

Mouvedia commented 3 years ago

Just out of curiosity, based on this list, how many non-mutually exclusive pragmas may a function have at most?

ghost commented 3 years ago

Under status quo, how many keywords can annotate a function at most? That many.

ghost commented 3 years ago

On further consideration, this would encourage feature creep and language heterogeneity. We have formatting tools, we don't necessarily need to maintain legacy to infinity -- and as long as we provide good enough facilities to interface directly with machine code (cough cough #5241 cough cough), this has no advantages.

ghost commented 3 years ago

Actually, there is a case for this: independent language extensions. Say someone wants to create a graphics API akin to CUDA/OpenCL, but using Zig instead of C/C++ -- adding all the necessary features to Zig proper would be impractical and bloating, but implementing them in language-space in the fork would be laughably incompatible. We may define this as a standard sequence, and either ignore or error on it in the base implementation, giving extension implementors free reign.

This may be construed as encouraging gratuitous forks, creating confusion around compiler versions -- however, the way I see it, people who want something unique from the language will fork it, and unless we make it easier for these forks to track upstream, they'll diverge.

ghost commented 3 years ago

As pointed out by Felix on Discord, if it doesn't make sense to extend Zig proper, it will make more sense to create a completely different language. We want to encourage the right tool for the job -- either this might be Zig, or it definitely is not. There is no in between, and we don't want to fool people into thinking there is.