andrewrk commented 1 year ago

Extracted from #14265.

Zig Object Notation

Instead of...

[package]
name=libffmpeg
version=5.1.2

[dependency]
name=libz
url=https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz
hash=c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae

[dependency]
name=libmp3lame
url=https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz
hash=9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f

It would look like this:

.{
    .name = "libffmpeg",
    .version = "5.1.2",
    .dependencies = .{
        .libz = .{
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },
        .libmp3lame = .{
            .url = "https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz",
            .hash = "9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
        },
    },
}

It also should be renamed with the .zon extension, and furthermore, [package] should be renamed to [project] to match the correct terminology.

Perhaps the . in Zig language itself should be removed, and then it could be like this:

{
    .name = "libffmpeg",
    .version = "5.1.2",
    .dependencies = {
        .libz = {
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },
        .libmp3lame = {
            .url = "https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz",
            .hash = "9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
        },
    },
}

which is pretty unoffensive if you ask me. Lots of people hate that dot anyway. I don't know if the parsing would be problematic or not though, would need to investigate.

See competing proposal:

14289

ghost commented 1 year ago

Ok, this isn't quite as bad as I was expecting. Though of course we'd have to provide a reference parser to be taken seriously, and if we write that in Zig, we'd have to make sure it itself has no dependencies.

Also I think removing the dot from dotbrace really would cause parsing problems. Is {} an empty tuple or a void constant? Is { key = "value" } a struct-like tuple or an assignment?

deflock commented 1 year ago

Maybe I missed something but what is the main profit of using this own DSL (not supported by any tool)?

In yaml:

name: libffmpeg
version: 5.1.2
dependencies:
  libz:
    url: https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz
    hash: c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae
  libmp3lame:
    url: https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz
    hash: 9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f

ghost commented 1 year ago

@deflock In short, no existing tool satisfies all our requirements. For more detail, see the linked PR.

andrewrk commented 1 year ago

YAML is too complicated and has unacceptable footguns.

deflock commented 1 year ago

@EleanorNB I've scrolled one more time and I see why json/csv do not satisfy, s-expressions in imperative language is a bit weird :) So remaining candidates are toml, yaml, own format. I see only one profit of using own format - if it will be used directly within build.zig so you don't need additional file for declaring dependencies, but it's not possible bc of declarative reasons.

batiati commented 1 year ago

andrewrk commented 1 year ago

@deflock that's a nice point in favor of this proposal that I hadn't considered - similar to how in node.js you can require("foo.json"), it could make sense to support @import("foo.zon") in Zig which would give you comptime access to the data.

andrewrk commented 1 year ago

Another point in favor of this proposal over the TOML one: #14291. Note how line diffs against mirrors field (an array) would be annoying in the TOML case due to both elements being on the same line.

kuon commented 1 year ago

I think this is much better than TOML subset.

Upsides:

can be @import as it is valid zig
no new format that is TOML but not really
allow for reuse of editor plugins
could itself support @import, like this

.{
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        }

.{
    .name = "libffmpeg",
    .version = "5.1.2",
    .dependencies = .{
        .libz = @import("libz.lulz"),
        .libmp3lame = .{
            .url = "https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz",
            .hash = "9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
        },
    },
}

perhaps a few functions could be whitelisted (but this needs to be investigated)
can be extended without having to make the format or the parser more complicated
build.zig could have a package function which would return a struct like this, this can be used to generate the .lulz file with zig code, which leads to the next point
this would put some effort into an object notation, which is really a nice supporting tool for a language, it can provide tooling for things like configurations, object format (game state...), rust has ron. I think this would be better than working on a pseudo TOML parser. To say it more plainly, work put into that would be directly put into zig.

Downsides:

a bit harder to write
further from the left margin
people don't like .{ but maybe that dot could be removed, even if I personally like it

Please, don't use YAML, every time I work with that, I lose my sanity over some weird thing

thejoshwolfe commented 1 year ago

why do we have this

    .dependencies = .{
        .libz = .{
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },

instead of this?

    .dependencies = .{
        .{
            .name = "libz",
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },

does the former actually work in Zig if there's no struct with a libz: Dependency field defined in it? We don't have hashmap literals in Zig, right?

I would also like to vote against the name "lulz", as that word is often associated with mob harassment and racism.

andrewrk commented 1 year ago

I would also like to vote against the name "lulz", as that word is often associated with mob harassment and racism.

Alright fine, I changed the proposal to the much more conservative and obvious choice of .zon.

andrewrk commented 1 year ago

Alright, enough deliberation. Moving forward with this one; rejecting #14289.

suhr commented 1 year ago

This reminds me of https://github.com/ron-rs/ron.

ikskuh commented 1 year ago

I don't know if the parsing would be problematic or not though, would need to investigate.

That wouldn't be a problem as ZON is a data format, and we only need .{} (a struct/tuple) to be distinguished from a {} (block) in Zig, as we can execute code that might break data from the block.

We don't need to execute that here. But imho, for the sake of consistency, i'd prefer .{ over {, so i can copy-paste the code into Zig.

@thejoshwolfe: I actually prefer .libz = .{ over .{ .name = "libz", as the key syntax implies that the names have to be unique, due to fields have to be unique. The second syntax doesn't convey that information.

likern commented 1 year ago

I proposed https://github.com/ziglang/zig/pull/14265#issuecomment-1380915434, this proposal alignes to my needs, more or less idea I envisioned. I upvote for them.

The only thing I dislike is build.zig.zon. If we already have custom format called Zig Object Notation just build.zon is better.

build.zig and build.zon ❤️ perfect match.

It looks and feels very much like improved JSON format. Very structered and understandable, but with comments.

cryptocode commented 1 year ago

Zon of a gun, this is brilliant ❤️

Will comments be // ?

17dec commented 1 year ago

Another point in favor of this proposal over the TOML one: https://github.com/ziglang/zig/issues/14291. Note how line diffs against mirrors field (an array) would be annoying in the TOML case due to both elements being on the same line.

With the ini format you could have duplicate url keys though, which would achieve the same.

@thejoshwolfe: I actually prefer .libz = .{ over .{ .name = "libz", as the key syntax implies that the names have to be unique, due to fields have to be unique. The second syntax doesn't convey that information.

While I agree that the syntax is nicer, I also believe that if a Zig data format is chosen, it makes sense that the schema can be described by a Zig type and that the data can be loaded directly into said type. Using the key syntax to describe a dynamic map kinda loses both of those properties

wooster0 commented 1 year ago

At first I thought this was a new format with a subset of features of normal Zig but as far as I understand it it's just a regular Zig file with a single struct expression that can be imported, which I like very much. But why do we have to call it build.zig.zon or build.zon? It looks kind of convoluted when we could just call it packages.zig? This could be the standard name for the file containing a struct expression with the packages listed. build.zig.zon or more recently build.zon indicates to me it is a new separate format (Zig Object Notation) where I can't expect all features from Zig to work, even though that's not the case because it is in fact a normal Zig file (or did I get that wrong?).

build.zon sounds like build.zig; a file that builds my project, but written in "Zig Object Notation"...? Seems rather confusing.

IMO build.zig and packages.zig would be a much clearer and simpler naming.

If however this is actually supposed to be a new format with only a subset of features of normal Zig, why? Why do we need to create a new format when a normal Zig file could also contain this struct expression listing the packages? It seems like this way there will always be new requests to port features from normal Zig over to this format, duplicating a lot of work.

cryptocode commented 1 year ago

@r00ster91 I assume it's because zig build --build-file <path> which means you can have multiple build files in the same directory.

Edit: to clarify, commenting on the presumed reason for the naming standard

wooster0 commented 1 year ago

To clarify, this already works today:

packages.zig

.{
    .name = "libffmpeg",
    .version = "5.1.2",
    .dependencies = .{
        .libz = .{
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },
        .libmp3lame = .{
            .url = "https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz",
            .hash = "9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
        },
    },
}

build.zig

const packages = @import("packages.zig");

// ...

So can't we go from here?

So, yes, you can give the packages.zig a fancy name such as "Zig Object Notation" but it's still simply a struct and works in regular Zig so I don't see the point of giving it a different file extension.

I assume it's because zig build --build-file which means you can have multiple build files in the same directory.

I'm not quite sure how this relates to the need of a new object notation.

cryptocode commented 1 year ago

I'm not quite sure how this relates to the need of a new object notation.

It doesn't, was just commenting on chosen file naming standard, which allows the zon file to be found automatically even with the build file flag + multiple build files in the same directory.

likern commented 1 year ago

To clarify, this already works today:

packages.zig
.{
    .name = "libffmpeg",
    .version = "5.1.2",
    .dependencies = .{
        .libz = .{
            .url = "https://github.com/andrewrk/libz/archive/f0e53cc2391741034b144a2c2076ed8a9937b29b.tar.gz",
            .hash = "c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae",
        },
        .libmp3lame = .{
            .url = "https://github.com/andrewrk/libmp3lame/archive/497568e670bfeb14ab6ef47fb6459a2251358e43.tar.gz",
            .hash = "9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
        },
    },
}
build.zig
const packages = @import("packages.zig");

// ...
So can't we go from here?

So, yes, you can give the packages.zig a fancy name such as "Zig Object Notation" but it's still simply a struct and works in regular Zig so I don't see the point of giving it a different file extension.

I assume it's because zig build --build-file which means you can have multiple build files in the same directory.

I'm not quite sure how this relates to the need of a new object notation.

Why not having full-blown language has been answered in original post, see this https://github.com/ziglang/zig/pull/14265, and my proposal https://github.com/ziglang/zig/pull/14265#issuecomment-1380915434

cryptocode commented 1 year ago

Antlr4 grammar for status quo: https://github.com/cryptocode/zon-grammar (assuming initial dot, will adjust grammar as the proposal progresses)

deflock commented 1 year ago

I actually prefer .libz = .{ over .{ .name = "libz", as the key syntax implies that the names have to be unique, due to fields have to be unique. The second syntax doesn't convey that information.

Do keys support "unusual" characters?

.lib-z@2.x = .{
  .url = "",
}

.{ 
  .name = "lib-z@2.x",
  .url = "",
}

const libz2_dep = b.dependency("lib-z@2.x", .{});

InKryption commented 1 year ago

See arbitrary identifier syntax @"": https://ziglang.org/documentation/0.10.0/#Identifiers

Assuming this feature is part of the subset, "unusual" identifiers shouldn't be an issue.

andrewrk commented 1 year ago

@r00ster91

To clarify, this already works today:

const packages = @import("packages.zig");
const std = @import("std");

pub fn main() !void {
    std.debug.print("name: {s}\n", .{packages.name});
}

$ stage3/bin/zig run test-zon.zig 
test-zon.zig:5:46: error: root struct of file 'packages' has no member named 'name'
    std.debug.print("name: {s}\n", .{packages.name});
                                     ~~~~~~~~^~~~~

Looks like a parsing bug to me. I expected not even zig fmt to succeed on packages.zig.

wooster0 commented 1 year ago

Okay, I suppose it does need to be a pub const, at which point I'm thinking why do we even need a separate file (format) for that when we could just list our dependencies within build.zig as pub const packages = .{}; or similar (and from there the user can still choose to put only the package list in a separate Zig file and then import that, maybe as pub const packages = @import("packages.zig").list;), so build.zig just has to contain a public constant called packages and then zig build will use that name to find the list of packages to install. It seems like a flexible solution where the user can choose to list their dependencies in their build.zig or in a separate file (if maybe the community decides that would be the standard way to do it).

kubkon commented 1 year ago

YAML is too complicated and has unacceptable footguns.

Take it from me, I implemented a basic YAML parser just for the sake of parsing Apple's TBD files and the parser got quite complicated while only caring about machine-generated dylib stubs. FWIW I am in favour of this proposal.

Vexu commented 1 year ago

Looks like a parsing bug to me. I expected not even zig fmt to succeed on packages.zig.

It's being parsed as the type expression of a top-level tuple field.

kuon commented 1 year ago

For reference and why this is better than YAML or TOML, you can read those if you are interested:

I hope we can learn from all this to ensure zon will not have those problems.

mlugg commented 1 year ago

Okay, I suppose it does need to be a pub const, at which point I'm thinking why do we even need a separate file (format) for that when we could just list our dependencies within build.zig as pub const packages = .{}; or similar [...]

This has been discussed a couple of times: the issue is that we want to be able to resolve transitive dependencies without actually performing semantic analysis on build.zig. This is for performance reasons: if we depend on package foo, and foo depends on bar, we want to start fetching bar as soon as possible. If we use zon, we can simply grab this file, parse it, and begin any necessary downloads; if we use a decl in build.zig, we need to actually run the compiler on that file, which could get complex.

With this being said, I'm not exactly sure how major of an issue that would be in practice. Semantic analysis is fast, especially with build.zig files generally being pretty small, and we obviously don't need to go through the actual codegen stage for this. The only case where this seems like it would matter is really big dependency hierarchies, and those will probably saturate the network connection with downloads fairly quickly anyway. However, I could be missing some other advantage of knowing the data without running Zig.

andrewrk commented 1 year ago

https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML

what a magnificent first paragraph :star_struck:

andrewrk commented 1 year ago

@r00ster91

quoting myself:

There is a lot of value to be gained in declarative information for certain things. In particular, it's much faster to fetch all transitive dependencies if you can learn what network URLs to go download, or the file system paths to check the cache for, without running a compiler.

Furthermore, third party tools such as package indexes want to understand information about packages without executing untrusted code. Even first-party tools such as zig itself will use this declarative information to provide package management tooling.

The manifest file describes the set of potential dependencies, and the build.zig logic can still pick and choose which ones to use depending on build configuration.

mlugg commented 1 year ago

@andrewrk

Regarding speed of fetching transitive dependencies, would that really be a major time sink in any practical use case? build.zig scripts are small and quick to compile, and we only need to go through semantic analysis before we have the info required from that decl. A moment extra before fetching dependencies only really matters for huge dep trees, where you'll quickly end up bottlenecked by the network anyway.

Regarding sandboxing, wouldn't that be solved by #14286?

andrewrk commented 1 year ago

Consider the following two scenarios:

Scenario 1

14309. Approximately limited by the user's network bandwidth.

Scenario 2

compile & run build.zig
find out about a small set of dependencies and make network requests. wait for them all to complete.
compile & run build.zig layer 2
find out about more dependencies and make more network requests. wait for them all to complete.

Repeat until all dependecies fetched.

Scenario 2 is unacceptable, not only from a performance perspective, but also because of:

third parties such as package indexes which should not need a dependency on the zig compiler for basic operations.
more complicated logic for fetching

It's more complicated to do it without a declarative file, and gives worse results. There's no reason to do it whatsoever.

loic-sharma commented 1 year ago

I used to work on NuGet, .NET's package manager. NuGet originally put the entire dependency tree in its packages.config file. This was a huge mistake. We later migrated to a new format, PackageReference, where you only list your project's direct dependencies. This migration made a significant difference in developer satisfaction (measured through HaTS).

more complicated logic for fetching

Very true, but, you're offloading this complexity to the developer. .NET developers complained that packages.config was unmanageable. It was difficult to tell if a dependency could be safely removed. Also, you'd need to understand your dependency tree if you wanted to update a single dependency. Excellent IDE tooling was not enough to offset this pain.

not only from a performance perspective

Performance was a top concern for NuGet's migration: NuGet runs in every .NET build, including Visual Studio's design time builds that power Intellisense. We were able to make this fast by:

Use minimum version selection. This makes your dependency graph stable for a given set of direct dependencies. In other words, you only need to download packages if the developer adds/updates their direct dependencies.
Optimize for the "no-op" case. NuGet is blazing fast if your direct dependencies haven't changed.

andrewrk commented 1 year ago

@loic-sharma, perhaps your comment was intended for this issue: #14309

coderonion commented 1 year ago

At first I thought this was a new format with a subset of features of normal Zig but as far as I understand it it's just a regular Zig file with a single struct expression that can be imported, which I like very much. But why do we have to call it build.zig.zon or build.zon? It looks kind of convoluted when we could just call it packages.zig? This could be the standard name for the file containing a struct expression with the packages listed. build.zig.zon or more recently build.zon indicates to me it is a new separate format (Zig Object Notation) where I can't expect all features from Zig to work, even though that's not the case because it is in fact a normal Zig file (or did I get that wrong?).

build.zon sounds like build.zig; a file that builds my project, but written in "Zig Object Notation"...? Seems rather confusing.

IMO build.zig and packages.zig would be a much clearer and simpler naming.

If however this is actually supposed to be a new format with only a subset of features of normal Zig, why? Why do we need to create a new format when a normal Zig file could also contain this struct expression listing the packages? It seems like this way there will always be new requests to port features from normal Zig over to this format, duplicating a lot of work.

great！

asilvas commented 1 year ago

YAML is too complicated and has unacceptable footguns.

json5 is the best.

silversquirl commented 1 year ago

From https://github.com/ziglang/zig/pull/14265:

json5 exists but a lot of tooling does not expect it. and what's the extension? .json5 or .json? either one is a bit problematic.

kuon commented 1 year ago

I think it is pretty clear/decided that we will prefer to use some form of zig to define package informations (and not toml, yaml...), but there are a few way to do that, and I think this is what has to be decided now.

For the extension, I prefer just .zon because in my mind, dual extensions is for templates. foo.html.heex means an html template written in elixir templating language. build.zig.zon would mean a build.zig file will be generated when running the zon file. But if we use zon we have to justify the differences with normal zig. I mentioned rust object notation (ron) which is quite different from rust. But zig being simple, maybe we do not need to subset it.

The other possibility we did not discuss is to simply have package function in the build.zig file, which would return a datastructure like the one proposed. Of course this requires the sandboxing to be implemented. While I mention this, I am not in favor of it, as it prevents automation like zig package add <name> which would edit the .zon file.

I'll try to summarize the options:

a package() function in build.zig that return the package information (approach used by elixir mix.exs)
a .zon file, which can be .zig.zon or just .zon, but we have to "justify" the use of a subset of thing and define it (if we have a sandbox, there is little reason to use a subset).
a normal .zig file which would be executed on its own returning the datastructure

Also, the definitive format of this data structure has to be decided, there is #14309 and https://github.com/ziglang/zig/issues/14290#issuecomment-1381285930

maln0ir commented 1 year ago

Hello. I'm just a random traveller passing through, but I dare to wonder: are you trying to reinvent setup.py? If this is indeed your plan, then please make it as restrictive, boring and to-the-ground experience as possible, because if sky is the limit during build then you can be sure some guy will reach the orbit.

silversquirl commented 1 year ago

@kuon this file is for declarative information that needs to be accessible without building and running Zig code. That's why it's a new format and not just a Zig file.

I can't find the comment where andrew explains why this is needed, but the summary is that we want to be able to fetch all package dependencies without requiring Zig execution, both because building and running Zig code is slow and would slow down the recursive fetch significantly, and because other tools may want to deal with package dependencies without needing a dependency on the Zig compiler.

EDIT: oh, found the comment, it was in this issue :) https://github.com/ziglang/zig/issues/14290#issuecomment-1382604540

kuon commented 1 year ago

@silversquirl Yes you are right. But I mentioned this for completeness. I am also in favor of zon but we have to be sure that it is justified.

wooster0 commented 1 year ago

Now that I think we have found a good reason for introducing this new file format Zig Object Notation and not using the compiler for this, we can probably talk more about the file naming. As for the current zig.build.zon, I agree we should shorten it to just .zon. Currently to me it means it's a complement to build.zig.zon that is written in ZON, which makes sense, but I think we can improve this and make it clear what actually is in this file.

I suggest packages.zon which just makes more sense to me as it is not the Zig Build Object Notation or such but simply ZON meaning it can be used for multiple purposes. So packages.zon means it lists our packages in an object notation called ZON.

I think that's better than status quo build.zig.zon which can be ambiguous. Does it mean it's build.zig but written in ZON? No. Is it a complement to build.zig? Yes, but what it contains is still not clear. I think this stack-up of file extensions is prone to confusion.

So I think packages.zon would be the most obvious naming and I think it matches the new build system terminology proposed in #14307, too. Also, it's one character shorter.

tato commented 1 year ago

An advantage of build.zig.zon is that the file would always show up next to build.zig in listings sorted by file name.

likern commented 1 year ago

package.zon for defining package and direct (top-level) dependencies, package.lock.zon for fixed transitive (non-direct) dependencies, generated and maintained by package manager only.

Let's reuse knowledge of other package managers (npm as most advanced) which already faced with all kind of subtle bugs and errors.

silversquirl commented 1 year ago

@likern we shouldn't blindly copy other package managers. Looking at their decisions and why they were made is a good idea, but package managers like cargo, npm, etc. were not designed for Zig and differ quite a lot from Zig's package manager.

michaelbartnett commented 1 year ago

Possibly cursed question:

If .zon has the leading .{, should that imply that it could support a leading type annotation? Maybe in a .zton file? (Zig Typed Object Notation). Nice implications for if you need to do polymorphic serialization (thinking about use cases totally unrelated to packaging).

silversquirl commented 1 year ago

@michaelbartnett I think for that you'd use a union-style .{ .foo = .{ ... } } or .{ .bar = .{ ... } }

ziglang / zig

proposal: instead of build.zig.ini, use build.zig.zon #14290

14289

Scenario 1

14309. Approximately limited by the user's network bandwidth.

Scenario 2