Open andrewrk opened 1 month ago
-
bytes not allowed
this is very common in the existing ecosystem and I'd recommend using _
or --
for the path instead
re the 16-byte name limit, https://github.com/nektro/zig-iso-3166-countrys and https://github.com/nektro/zig-iso-639-languages use package names that are both 17 in length
I edited the proposal with these changes:
-
bytes not allowedthis is very common in the existing ecosystem and I'd recommend using
_
or--
for the path instead
I would suggest to separate components by:
|
(vertical bar or pipe)
instead, since all three components (name, SemVer if I understand spec correctly and sized-hash) already disallow them. So now -
can be allowed in names. Example:
openssl-lib|3.3.1-1|KLdkAMs-vt5n
IMHO it's also easier to read by human and machine.
|
is not allowed in Windows file names. Please see "filesystem-safe name required" in the list above.
|
is not allowed in Windows file names. Please see "filesystem-safe name required" in the list above.
That's what I'm reading. If |
character is disallowed inside all three components on all platforms (IIUC), then it surely can safely act as separator between components themselves? Did I miss something here?
the separator needs to be filesystem safe too because this scheme ends up as the name of a folder
Did I miss something here?
The fact that the directory name would be $name$sep$semver$sep$hash
. If $sep
is defined to be |
, you now have an invalid directory name on Windows.
Of course! Thanks to you all! How ignorant of me 🤦, to read and immediately forgot such basics. I apologise for unsensible message.
Note that -
is an allowed character within semver, and a version can technically have an arbitrary number of -
characters:
A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--.
I'll throw ~
into the mix as a possible separator.
-
bytes not allowedthis is very common in the existing ecosystem and I'd recommend using
_
or--
for the path instead
I disagree with allowing -
even if a workaround is used to fix the issue that sqeek describes. Only one obvious separator should be allowed in package names.
Regarding the name length limit, thanks @shadeops for doing a bit of legwork:
Pypi has all their package metadata in BigQuery, so [here is the] number of bytes for their package names.
x axis is number of bytes y axis is number of packages where the name has that number of bytes. source is Google Cloud's Big Query:
SELECT BYTE_LENGTH(name) as bytes_per_name, COUNT(*) as name_count FROM (SELECT DISTINCT name FROM `bigquery-public-data.pypi.distribution_metadata`) GROUP BY bytes_per_name
I like this proposal. I have a few miscellaneous thoughts:
$sizedhash
input is always 9 bytes, it will always encode to 12 bytes of base64, so there shouldn't be any ambiguity in recovering it even if the name and version contain -
s. Relying on this does make the format less flexible, though.build.zig.zon
, the current proposal of a 2-byte header and 31-byte truncated SHA-256 means that the package size information wouldn't be available in the hash. To more closely unify the hash formats for these cases, what if the -$sizedhash
part is kept as-is, and the $name-$semver
part is replaced with a base64 encoding of the remaining 27 bytes of the SHA-256 digest (36 bytes encoded)?I currently use -
in both the package name and version of zig-wayland
, zig-wlroots
, zig-xkbcommon
, and zig-pixman
. The semantic version for an untagged git master commit of these packages has the form 0.1.0-dev
or similar. Only commits that have a git tag get a version without the -dev
suffix.
I also think that using _
for the separator would be preferable to -
due to the fact that semver allows -
and due to the subjective higher prevalence of -
in existing package names in the zig ecosystem compared to _
.
- Filesystem-safe name required.
The rules for legal tokens in paths on Windows are (unfortunately) more complicated than just those characters:
Do not use the following reserved names for the name of a file:
CON, PRN, AUX, NUL, COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, COM¹, COM², COM³, LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, LPT¹, LPT², and LPT³. Also avoid these names followed immediately by an extension; for example, NUL.txt and NUL.tar.gz are both equivalent to NUL.
Paraphrasing, this means that path components starting with any of the above names immediately followed by a dot are forbidden on Windows, which means that package names like con.zig
(console library?) and aux.zig
(audio library?) will fail to create their corresponding directories on Windows. Whether Zig should encode these edge cases in the rules for legal package names or consider this an unfixable Windows quirk/bug I don't have any strong opinions on, but it's at least good to be aware of these limitations.
Related, I wonder if there's much of a point in allowing such a large set of characters in package names. Maybe it would be easier for both the package manager implementation and the users if the set of allowed names was restricted to just "legal unquoted identifier in Zig source code (approximately /[A-Za-z_][A-Za-z0-9_]*/
) no longer than 32 characters" or something similarly restrictive.
I also want to mention ` (space) as a potential separator which won't collide with semver components, though it would also mean that paths like
.cache/zig/p/StaticHttpFileServer 0.0.0 ozYAAOnhf9Zq/build.zig` would need to be quoted in shell scripts and the terminal if that is a concern.
Limited to 32 bytes
Another datapoint taken from my projects:
parser-toolkit
is 14 bytesdisk-image-step
is 15 bytesmicrozig/bsp/raspberrypi-rp2040
is 31 bytes so i guess the bsp here is borderline, but it would still fit
Why not just use directories?
i.e. $name/$semver/$sizedhash
That would avoid adding new restrictions (assuming most package names are already valid file names) and provide a much cleaner and easier to browse ~/.cache/zig/p directory.
I edited the proposal with these changes:
build.zig.zon
file. I like this because it's a restriction which could be relaxed in the future. It's more difficult to unrelax in the future. I don't want spaces in the filenames however because I think it is generally nicer to not have spaces in filenames.I currently use
-
in both the package name and version ofzig-wayland
,zig-wlroots
,zig-xkbcommon
, andzig-pixman
. The semantic version for an untagged git master commit of these packages has the form0.1.0-dev
or similar. Only commits that have a git tag get a version without the-dev
suffix.
@ifreund I think you should remove "zig-" from those package names. It's redundant information. No Zig package should have "zig" in the name.
@ifreund I think you should remove "zig-" from those package names. It's redundant information. No Zig package should have "zig" in the name.
I do not see zig-
as redundant information for the project name of, for example, zig-wlroots
. The project provides idiomatic Zig bindings for wlroots and Zig is a critical enough part of its identity to be in the project's name. The same basic naming scheme is used for all projects providing wlroots bindings for other languages and I see no reason to deviate. (go-wlroots
, wlroots-ocaml
, chicken-wlroots
, wlroots-rs
, hsroots
, clwlroots
, ...).
I also have no plans to change the name of my git repositories on online code forges to something other than zig-wlroots
. The repository name should match the project name.
My intuition tells me that it is least confusing if the package's name matches the name of the repository and the name of the project. Perhaps I am wrong about this but I don't think the decision is as obvious as "It's redundant information."
I do use the plain wlroots
name for the module exposed by the zig-wlroots
package. This means there is no redundancy in consuming zig code. Users write @import("wlroots")
as one would expect.
In any case, I don't see any technical benefit to disallowing -
in package names. I see such a change as unnecessary and undesirable ecosystem churn.
Ascetically, I personally quite like @neurocyte's proposal of using sub directories instead, i.e. .cache/zig/p/zig-wlroots/0.17.0/$HASH/
.
That proposal does have complexity tradeoffs though. I also quite like @squeek502's proposal of ~
as a separator and think that forbidding it in package name would cause significantly less churn than forbidding -
. I subjectively find the format pleasing as well: zig-wlroots~0.17.0~THISISAHASH
.
@ifreund
The repository name should match the project name.
Is there a compelling reason for this? It seems to me that zig-wlroots
as repository name and wlroots
as project name in build.zig.zon
would be reasonable. The repository name has to disambiguate itself from other wlroots
-related projects, but that isn't an issue once we get down to the Zig project level.
I edited the proposal with these changes:
- Incorporate @castholm's suggestion to make package names required to be valid Zig identifiers...
This dosen't yet handle the presence of -
in semver versions as allowed by the semver spec and used in practice by existing zig projects.
I do agree that requiring package names to be valid zig identifiers is a nice property despite the fact that I'm not excited about dealing with the churn of renaming zig-wlroots
, zig-wayland
, zig-xkbcommon
, and zig-pixman
to zig_wlroots
, zig_wayland
, zig_xkbcommon
, and zig_pixman
.
The part of this proposal that feels a bit strange to me is using enum literal syntax in the build.zig.zon but disallowing identifiers created with the (valid) .@"zig-wlroots"
syntax. This is unexpected behavior IMO given knowledge of the zig language's semantics.
@alexrp I think you have confused "project name" with "package name" in my comment.
@ifreund Just replace "project name" in my comment with "package name". To be clear, I'm suggesting that naming the repository zig-wlroots
and the package just wlroots
(matching the module name) would make sense to me. It's worth noting that e.g. wlroots-rs
is just wlroots
on crates.io, so there is at least some precedent there.
@ifreund Just replace "project name" in my comment with "package name". To be clear, I'm suggesting that naming the repository
zig-wlroots
and the package justwlroots
(matching the module name) would make sense to me. It's worth noting that e.g.wlroots-rs
is justwlroots
on crates.io, so there is at least some precedent there.
TLDR: maybe we don't want to have "wlroots original project" package's and "wlroots bindings" package's names to be clashed/confusing.
I think it still makes sense to name package "zig-wlroots" and not "wlroots": AFAIK unlike cargo and other language-specific package managers, using Zig's build system and package manager by projects in C with no Zig code is one of the main priorities. Another way, hypothetically projects like wlroots has much higher chance to adopt build.zig(.zon) than build.rs etc.
If at some point in future SDL or wlroots (or other library) are brought to Zig package manager, IMHO it would be much less awkward to have "wlroots" package name for upstream project and "zig-wlroots" for bindings, rather than both of them having "wlroots" package.
Using enum literals for zig package names would go along well with another proposal I recall from some time ago (but can't find a link for):
// Imports of zig files use a string literal argument to `@import()`.
const foo = @import("foo.zig")
const bar = @import("foo/bar.zig");
// Imports of packages use an enum literal argument to `@import()`.
const std = @import(.std);
const wlroots = @import(.wlroots);`
This would have the advantage of removing some current ambiguity. What if there is both a file called wlroots
and a package called wlroots
or a file called foo.zig
and a package called foo.zig
?
Using enum literals for zig package names would go along well with another proposal I recall from some time ago (but can't find a link for):
This? https://github.com/ziglang/zig/issues/6279#issuecomment-688524037 https://github.com/ziglang/zig/issues/2206#issuecomment-692607482
I do not see
zig-
as redundant information for the project name of, for example,zig-wlroots
. The project provides idiomatic Zig bindings for wlroots and Zig is a critical enough part of its identity to be in the project's name. The same basic naming scheme is used for all projects providing wlroots bindings for other languages and I see no reason to deviate. (go-wlroots
,wlroots-ocaml
,chicken-wlroots
,wlroots-rs
,hsroots
,clwlroots
, ...).I also have no plans to change the name of my git repositories on online code forges to something other than
zig-wlroots
. The repository name should match the project name.
I think you are getting the project name mixed up with the Zig package name. I am not suggesting to rename your source code repository. I think zig-wlroots is the best name for the source repository.
The prefix "zig-" in the name
field of build.zig.zon is, however, entirely redundant and should be omitted. This is so blindingly obvious to me that I'm finding it difficult to even express any reasoning for it.
Can you give a single example for when "zig-" in the zig package name would disambiguate anything?
To be honest, I would also consider zig-wlroots
to be a better package name than wlroots
. The name makes it clear that this is a set of bindings to an existing library, rather than the library itself being implemented in Zig. This difference is key enough that I think it's worth making obvious. Naming the package wlroots
, in my eyes, implies it is in some sense "authoritative", i.e. that it is the upstream wlroots implementation.
It also better handles the case of multiple sets of competing bindings existing for the same library. e.g. if I create a competing set of wlroots bindings exposing a different API, I perhaps name it something like zlroots
(the name being different to avoid any potential confusion with the existing bindings); but if the existing bindings package were named wlroots
, that wrongly implies it to be "more official" than mine.
I don't think any value comes from having the project name differ from the Zig package name; this seems to me like nothing but a potential avenue for confusion. (Indeed, our original proposed terminology surrounding the package manager called a "package" a "project"; while we changed this nomenclature for good reason, I think the idea it communicates is still valid, that the project and what we now call the package are the same thing.)
I do use the plain
wlroots
name for the module exposed by thezig-wlroots
package. This means there is no redundancy in consuming zig code. Users write@import("wlroots")
as one would expect.
repo name / package name / import name
these three also also independent from what a consumer chooses as the dependency name
the package name isnt used for much afaik (this proposal is the first explicit use im aware of) so it makes sense to me that there's some differences in what people align it with, and I can totally understand why someone might go either way
The name makes it clear that this is a set of bindings to an existing library, rather than the library itself being implemented in Zig.
If you're trying to indicate that the package has to do with bindings, then put the word "bindings" in the name.
Or, just keep it bare, to leave room for the fact that you might choose to expose both bindings, and a method of building the library from source with a future version.
In node.js land the convention is to use "node-foo" for the repo name and "foo" for the npm package name. Lots of people redundantly put "node-" also in the npm package name and it was redundant then, too, also, as well.
source: https://docs.npmjs.com/cli/v10/configuring-npm/package-json
Don't put "js" or "node" in the name. It's assumed that it's js, since you're writing a package.json file
- Limited to total file bytes of 4 GiB or less
- ...or, should the size field saturate for packages bigger than this?
I would be in favour of saturating the size field for larger packages. It's conceiveable that Zig packages may need to ship large binaries to avoid compiling things that take a long time (LLVM, Dawn, etc.), or to provide access to libraries that are closed source.
The Zig package manager can also be used to fetch non-code artifacts, such as texture or model data for a game for instance, which obviously can be quite large.
In node.js land the convention is to use "node-foo" for the repo name and "foo" for the npm package name. Lots of people redundantly put "node-" also in the npm package name and it was redundant then, too, also, as well.
source: https://docs.npmjs.com/cli/v10/configuring-npm/package-json
Don't put "js" or "node" in the name. It's assumed that it's js, since you're writing a package.json file
But you don't put entire wlroots library into npm ecosystem, like you (potentially) do in zig's ecosystem. They can safely assume "muhaha" package contains only JS bindings for "muhaha" library, again, unlike Zig PM where both "muhaha" library package and bindings package can coexist. Most of the time node ecosystem can't have this name conflict.
I agree that it's redundant to add "zig-" prefix for Zig projects (like river or libxev), since it's not ambigious on this level. Ny disagreement is only about supplementary Zig code for C projects.
Make package hashes generally more user-friendly, so that it is more practical to interact with package directories on the file system, as well as interact with stack traces, debuggers, and other tooling that uses source code paths.
The current hash format is a hex-encoded multihash SHA-256.
It looks like this:
After this proposal, it would look like this instead:
This proposal is to change the hash format to
$name-$semver-$sizedhash
where:name
field frombuild.zig.zon
, limited to 32 bytes based on new rules outlined belowversion
field frombuild.zig.zon
, limited to 32 bytes based on new rules outlined below-_
to make it filesystem safePackage names gain new rules:
/[A-Za-z_][A-Za-z0-9_]*/
)The version field gains new rules:
Packages gain new rules:
Packages which lack a
build.zig.zon
file will have a$hashiname-P-$sizedhash
scheme instead:[5..][0..24]
bytes of the SHA-256, fss-base64-encoded, for a total of 32 bytes encodedP
which stands for "Pristine Tarball" or whatever you want, really. It acts as a version number so that any future updates to the hash format can tell this hash format apart. Note that"P"
is an invalid semver.The hash is broken up this way so that "sizedhash" can be calculated exactly the same way in both cases, and so that "name" and "hashiname" can be used interchangeably in both cases.
Related Future Work
20180
20183
Compatibility
Let's try to keep compatibility with the old hash format for at least 1 release cycle, so that there is 1 release cycle that supports both the old and new format at the same time.