ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.39k stars 2.43k forks source link

package manager #943

Closed andrewrk closed 1 year ago

andrewrk commented 6 years ago

Latest Proposal


Zig needs to make it so that people can effortlessly and confidently depend on each other's code.

~Depends on #89~

andrewrk commented 5 years ago

Brilliant. Thank you for explaining these use cases. I've thought about how to model these problems and although this comment does not contain my responses, here's the plan:

Getting through this dependency chain of issues will be my primary focus in 0.5.0.

binary132 commented 5 years ago

This has probably been touched on by someone else, but I find that in Go, I like having a copy of my dependencies, including versioning metadata, and in C/C++, I like using git submodules with forks of third-party dependencies in my account, so that I don't depend on the stability of the third-party interface.

In both cases, this a) avoids availability issues with remote sources, b) facilitates minimally-complicated redistribution of my source, c) in the case of Go, allows me to perform a single source clone without multiple Git operations. This is important to me because in nearly all cases currently I am forced to use either a very slow and spotty personal network connection, or a very slow and spotty network connection to my remote Git server, which is on the other side of the planet and likes to time out and freeze.

For this reason, I want to propose that Git not be the primary means of source redistribution, instead being used primarily for source versioning and development.

I think that the primary means of dependency distribution should be as single-file compressed archive downloads over HTTP -- or even some other means, such as BitTorrent! Even if internally they are implemented using Git or another DCVS, they should be distributed over a much more reliable and performant transport. Git is not a good protocol for distributing archives; it is a good protocol for manipulating and exchanging versioned source trees.

Sure, use Git for interacting with the primary repo as a developer, but go back to the old ways for distributing dependencies. Even support the distribution of dependencies as binaries with headers only.

On that note, I think the Xcode "Framework" concept is kind of neat. I don't like hiding details of things, but perhaps some idea of a library being packaged with metadata and related dependencies could make life nicer for most users.

You should also force all packages to be versioned, even if the version is "checksum of all its contents because the user never versioned it."

hah commented 4 years ago

My thoughts on Package Managers...

In addition to these, would be nice if the package name would not be used as an unique identifier (like in cargo/rust)

momumi commented 4 years ago

For *nix systems the zig package manager and tooling should respect the XDG base directory specification, and use whatever the corresponding locations are for Windows. Getting this right from the start will save people a lot of pain later.

Schroedingers-Hat commented 4 years ago
// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do @import("fancypantsjson") inside the package that depends on it.

But we want to know if this situation occurs. Here's my proposal for how this will work:


comptime {
    // these are random bytes to uniquely identify this package
    // developers compute these once when they create a new package and then
    // never change it
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

Could the random bytes not instead be a secret, and then the current version be signed? This will head off some typosquatting-adjacent attacks wherein the random bytes are used in a second source not actually controlled by the author, or a domain is hijacked and hosts a malicious payload (at least for users of the code that already have the signature).

It will require manually re-verifying on ownership-change, but I feel that's more of a feature than a bug.

I also think one of the biggest problems with node (and something that will soon be seen with cargo) is that less trustworthy (in code quality, safety, and malicous-ness) packages look very similar to highly trustworthy ones -- the cargo site has some metrics which help judge this, but they are not highly meaningful to beginners.

Some way of delegating some or all trust (or distrust) to the community (especially highly trusted members) would be nice for beginners who may not necessarily be trust their own judgement would be nice -- I have some thoughts, but do not wish to further clutter the comment unless they are considered relevant.

ikskuh commented 4 years ago

Could the random bytes not instead be a secret, and then the current version be signed? This will head off some typosquatting-adjacent attacks wherein the random bytes are used in a second source not actually controlled by the author, or a domain is hijacked and hosts a malicious payload (at least for users of the code that already have the signature).

If i understood andrew correctly, a specific package version has two things: The package_id that identifies the package and a cryptographic hash that verifies that the packages is unchanged.

This allows hosting the same package on multiple sources without having the problem of "is this source trustworthy?" because you can check the hash of the package. But this also allows the simple creation of forks when the original maintainer abandons the package, you can fork the package, change things and publish it together with a new hash. The package manager will still recognize the package as the same package even though the hash will differ. You have to change the hash in the reference though.

fengb commented 4 years ago

Some way of delegating some or all trust (or distrust) to the community (especially highly trusted members) would be nice for beginners who may not necessarily be trust their own judgement would be nice

I agree with this 1000%. One of the biggest problems with npm is that packages are all "peers" and maintained separately. This leads to an explosion of mostly unverifiable content and duplicate functionality — e.g. underscore vs lodash vs ramda vs ES6 native vs ES6 shims.

What if we had a concept of trustworthiness embedded into packages? Stealing from the Arch model, they have a split between "core", "extra", "community" in pacman as well as "aur" and finally "decentralized". We could somehow have groups that could be more trustworthy than others. My big question is can we improve trust and unify functionality while simultaneous keeping all packages reasonably decentralized?

andrewrk commented 4 years ago

The beauty of decentralization is that any third party can step up and become an entity that curates trusted packages. As far as I'm concerned, this task is (currently) out of scope for the Zig project itself. Zig's role will be to provide helpful tools that provide insight when choosing what packages to depend on, and when doing the chore of upgrading / maintaining. For example, the package manager could provide various ways to visualize a dependency tree for easy auditing of trust. It could provide a way to integrate with third party services that provide extra metadata, such as public keys of known authors.

In summary, Zig's package manager will be a glorified downloading tool, that provides analysis and details to help the human perform the social process of choosing what set of other people's code to rely on, but without any central appeal to authority.

Schroedingers-Hat commented 4 years ago

Would it be in scope to provide a canonical mechanism by which to assist that social process? My thought would be to sign assertions like 'reviewed', 'problem', 'supports', 'trusts', and 'distrusts'. Ie. if andrewk 'reviewed' a version of a package, that gives you a data point that the code is not obviously malicious or egregiously low quality. 'trusts' could be transitive up to a limit. So if noone in your network 'trusts' or 'reviewed' a package, then installing it would require manually verifying that this is what you wanted to do. If even the author does not 'support' a package, then you know including it in production is probably a bad idea. If someone you 'trust' 'distrusts' a package and noone else trusts it it won't install, etc.

Assertions could be hosted anywhere (including in the packages themselves). As well as produced by bots (fuzzbot 'reviewed' , semverbot 'reviewed' ).

I think something (vaguely) like this is the minimum for avoiding a bunch of attacks that have been used on the pip and node ecosystems. It also decouples curation and hosting in a way I quite like.

I realise this could all be done externally, but I don't think it would catch on without the package manager nagging you about low trust sources.

@MasterQ32 I think that automatically recognizing a package as 'the same package' if it has no involvement from the same author(s) is a bug though. Recognizing it if it claims to be the same package and there is a signature by someone whose authorship claim you have verified (either manually or by having it signed by another author) is the minimum. As such I see the benefit of the random 'canonical name' but still think not having a signature is a really bad idea.

adontz commented 4 years ago
  1. It's not a language package manager, but I would like to reference https://theupdateframework.io/security/ Malware was found in published packages in the past, so security is important.
  2. I do not see https://github.com/mrfancypants/zig-fancypantsjson and https://bitbucket.org/mirrors-r-us/zig-fancypants.git as different packages. We should consider them alternative download sources of the same package. I am not sure about the condition. I mean are packages the same if signatures match, or do we need anything else?
  3. I really do not like the idea of dependency on any version control system. It will make life of Windows users pure pain. Downloading a zip/tar.gz file over HTTP is easy, fast, firewall friendly, may be encrypted and signed. Downloading from random git repositories will trigger enterprise InfoSec guys, if allowed at all.
  4. I think there should be one central repository and possibility to easily add alternative repositories on a project level. Like apt/dnf do. Just DEB/RPM publishing is pain and package publishing should not be pain. Otherwise any typos will lead to installing rogue packages. (https://www.bleepingcomputer.com/news/security/ten-malicious-libraries-found-on-pypi-python-package-index/) I mean this process must be intentionally two step, with first explicit expression of trust in the author and only after that reference a package.
  5. What are reasonable answers to leftpad story? What if an author wants to remove a package which is a dependency for another package? How exactly should everything break or how exactly should everything not break?
  6. What are specific closed-source requirements? I'd say "don't even need https because of the sha-256" will not apply anymore.
  7. What about binary packages? What about environment specific packages? Problems python wheels try to fix. What if I want to vendor curl on Windows, but reference system one on Linux?
  8. What kind of reports a developer needs? Tree of dependencies, directed graph of dependencies?
  9. I don't think "zig build" should download anything. That's pretty much "hidden control flow".
FlyingWombat commented 4 years ago

Hello, As a passer-by who just stumbled upon Zig a few days ago, I'd like to add my 2-cents here.

TL;DR Please, let the package manager be independent and optional. At least please provide first-class support for offline builds.

IMO, a package manager is out of scope for a programming language (Zig, or any other). Yes, they can be very helpful for encouraging adoption and sharing code, but I think they can perform that role just fine as separate, and optionally bundled tools.

Take Python's pip for example (my main language). I consider pip to be barely usable. It works well enough for pulling down packages into a virtualenv during development, but I sure am glad I don't have to rely on it. Pip's saving grace is that it is optional for Python development.

Take Rust's Cargo for example. Cargo is a great tool. It has been a positive binding force for the Rust community, it's easy to use, and it just works. But Cargo's strength is also it's weakness, because it's an all or nothing deal. Cargo is great ... when it works. And Rust is useless without Cargo. I gave up on offline builds for Rust, and resorted to bringing my offline work laptop home to update the Rust software I use.

@andrewrk In summary, Zig's package manager will be a glorified downloading tool, that provides analysis and details to help the human perform the social process of choosing what set of other people's code to rely on, but without any central appeal to authority.

I'm glad this is your stance on it.
In my opinion, I see little reason for it to be more than merely a convenient wrapper around git, wget, and gpg.

Edit

Maybe my point wasn't clear. I am not against a package manager for Zig. Nor am I against it having it as an official Zig project. I just don't see why it should be directly integrated into the language/compiler.

judofyr commented 4 years ago

How about adding SHA256 as a parameter somewhere to addUrlPackage so it's possible to migrate to a different hash function in the future?

cyruseuros commented 4 years ago

Just as a note, though I really have no preferences on the matter, easy offline builds and decentralized code distribution are absolutely essential when it comes to the adoption of Zig as a systems language. It's mostly ideological, but this weakness is really slowing down the penetration of Rust into hardcore libre software (and a lot of systems stuff tends to be on that end of the spectrum).

To this end, even if private package repositories are supported from the get-go, a "main" one would likely emerge pretty quickly. Building any kind of software without it will become effectively impossible.

Go's approach, though somewhat more verbose, seems preferable in this regard, and is a smaller development burden. It should be easy enough to compose a searchable index for convenience. And working offline is as simple as fetching once, and vendoring from then on.

marcthe12 commented 4 years ago

@FlyingWombat Agree with you on this. Personally the solution is have a few paths to search dep, some are manage by zig' package manger while some are manager by user or distro. Then we can choose between use pkg manager only, use system only or use other as a fallback. Distro definately do not like packaging rust, go, haskell and npm stuff as it is pita. On the other hand C/C++/sh without relying on the system package manger is a PITA in places like windows where there is no package manager. A good way to think will have a dir relative to the zig binay marked as system while have another marked for the zig package man and a vendor dir for package specfic dir.

FlyingWombat commented 4 years ago

Way I see it, all the compiler needs is a way to resolve source imports in the filesystem. Then the package manager just does its work to get the dependencies, and simply reports the import paths to the build manager. A few suggestions:

Fetching dependencies and building could be as simple as zig-pkg fetch && zig build (if that's too much typing, just make a shell alias 😛 )

data-man commented 4 years ago

https://github.com/zigtools/zpm - Inofficial Package Manager for Zig

Viacheslav-Romanov commented 4 years ago

Protection from circular dependencies should be implemented.

sam0x17 commented 4 years ago

I second the notion that this maybe shouldn't specifically involve the compiler in any way. IMO just do a simple flat text file format (YAML/TOML/JSON) but build in all these features discussed above regarding decentralization, hashes, etc.

Packages could be called zags and the package manager binary could be zag. Then you could do things like zag install, zag update etc...

In terms of decentralization, I think git is always the best place to start. If you pair that with cryptographic hashes of the tar of the repo, you can get all the properties you want in terms of preventing force pushes or repo changes that alter the public version of a package without changing a version number.

Bonus points for automation of changes to the YAML/TOML/JSON config file, such as things like zag add dep package-name, zag update --save (writes to config file), zag update package1 package2.

Additional bonus points for a --offline mode. I envision it working by running zag export deps [file] on a machine with internet and then zag import deps [file] on the machine without internet. Then as long as you add --offline to all your zag commands it won't try to hit the internet and will instead use the local repository.

NightMachinery commented 3 years ago

Having a hashless, fetch-latest mode ala golang will also be good for small projects. There are tradeoffs between the two, and I think making the tradeoff should delegated to the user, not the package manager or the compiler. The user knows their own context best.

manast commented 3 years ago

No one has brought it up yet, but I think Deno's package handling could be a good source of inspiration, if we want a decentralized manager: https://deno.land/manual@v1.4.4/linking_to_external_code

FlyingWombat commented 3 years ago

From the above link to Deno's docs:

import { assertEquals } from "https://deno.land/std@0.73.0/testing/asserts.ts";

I don't think it's a good idea to bake URLs into Zig source code.

Deno is a JS/TS runtime -- a web technology. Allowing URLs in source imports makes sense for it.

Zig, on the other hand is a general purpose programming language, and needs 1st-class support for offline builds.

manast commented 3 years ago

@FlyingWombat I am not saying it should mimic Deno, just use as a source of inspiration since they have solved similar problems.

I am not sure what you mean with offline builds. Deno is just a runtime for server side code, it has the exact same requirements for building offline as Zig. If you read the documentation you will see that all dependencies are cached the first time you run (build) your application.

FlyingWombat commented 3 years ago

... just use as a source of inspiration ...

This I take no issue with. But I felt that many would just look at Deno's import syntax as the primary feature (as I did); and I disagree with that syntax for Zig. How Deno implemented it's package management, yes could have some valuable insight. If you have a specific feature or implementation detail from Deno's package management that you would like to highlight, please mention it.

What I mean by 1st-class support for offline builds is this: it should be just as easy to build a Zig project on an isolated system as it is on a connected one.

It must be straightforward to recursively download all dependencies on a connected machine. And likewise to transfer and vendor them locally on the isolated machine -- which will be the one performing the build. This is one reason why I've been advocating to keep the package manager and compiler as separate entities. All the connected machine would need in order to collect build dependencies is a small static-linked executable.

jayschwa commented 3 years ago

Offline builds and URL-namespaced dependencies are not mutually exclusive, but it would require intelligent caching or a vendoring mechanism. The Go toolchain is an example of this.

CantrellD commented 3 years ago

I agree with everything in this comment: https://github.com/ziglang/zig/issues/943#issuecomment-395914504

An anecdote: Part of my job involves maintaining a .Net project. We use NuGet for that project. One of our direct dependencies had a dependency on some specific version of a library. Another one of our direct dependencies had a dependency on some other version of the same library. The standard "solution" for this problem in the .Net ecosystem is a binding redirect, which (I believe) basically means choosing one version of the library, linking it to an assembly that expects a different version, and then praying that nothing goes wrong at runtime. Using binding redirects wasn't an option for us (it's complicated) so we had to refactor the project.

The rest of this comment is basically an attempt to apply the dependency inversion principle. The dependency inversion principle says that software components should depend on abstractions, not on concrete implementations.

In object-oriented programming, the dependency inversion principle gives us dependency injection: Any IO class that has dependencies should express those dependencies with some set of interface types in the constructor parameter list. Before anything else happens in the application, the concrete dependencies for a given IO class are instantiated, and then the IO class itself is instantiated. IO classes do not choose their own concrete dependencies. The type system for the programming language is able to validate this process because each IO class declares which interfaces it implements, and any invalid declaration will cause an error.

In package-management, I think, the dependency inversion principle gives us an analogous goal: Any library that has dependencies should express those dependencies with some set of abstract specifications in the package import list. Before anything else happens in the build process, the concrete dependencies for a given library are resolved, and then the library itself is resolved. Libraries do not choose their own concrete dependencies. The type system for the package manager is able to validate this process because each library declares which abstract specifications it implements, and any invalid declaration will cause an error.

Obviously you would need some way to express an "abstract specification" that covers everything important about the API. In a language like C, you would probably want to use header files. And header files would probably be good enough, if you don't need the minor version distinction that SemVer tries to encode. But if you do need the minor version distinction, then you probably need some concept of subtyping in the type system for the package manager, and I'm not sure how viable that is outside of an object-oriented language.

Some notes: I talk about "IO classes" because I don't really use dependency injection for datatypes or utility classes. This raises some questions, and I'm not sure what the answers are. I'm also not sure if this comment really makes sense, but I figured I might as well share my perspective.

ikskuh commented 3 years ago

I think zig can already do "dependency injection" for your use case: Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

andrewrothman commented 3 years ago

Hi. I just learned about Zig a few days ago and have been learning about it for a bit. It looks like a really awesome language.

I'd like to voice support for decentralized Deno-style package management:

Pros:

Cons:

Ultimately I think the arguments for / against decentralized package management come down to trust. With a centralized package manager, you trust its maintainers to keep packages immutable and mostly always available. With a decentralized approach, you trust your chosen package hosts and proxy, and you get the advantage of being able to choose who you trust to fill those roles.

Additionally, Deno optionally allows for import maps to let the user decide which alias they want to use in their code. Theoretically, I suppose this could allow the user to depend on two different versions of the same package under separate aliases ("my-package", "my-package-v2") which is a pretty cool plus.

I like the approach proposed in https://github.com/ziglang/zig/issues/943#issuecomment-383610569 as it seems to map closely to what I mentioned. But I'm curious: What's the disadvantage of putting the SHAs in a separate file like a lockfile? Maybe one could optionally omit those values in the addUrlPackage and in that case they will be automatically added to an optional lockfile or replaced in the build.zig inline? While working, I'd like to be able to specify a URL and have the language tooling calculate and store the SHA for me. Another small benefit of this would be the ability to specify URLs directly in import statements, without adding them to the build.zig file, which would make testing out new packages incredibly easy. Any thoughts?

Thanks for the amazing work!

Meai commented 3 years ago

I agree with everything in this comment: #943 (comment)

great comment to point out. All current package managers are terrible in this regard, they let me upgrade packages and then later at build time I get a errors that technically the package manager could have known about. It knows which functions are exported. It knows which functions I use. It could tell me if an upgprade will let me build or not. I think I can guess though why nobody ever implemented this... it's a lot of work for a quality of life improvement that is perceived as tiny or irrelevant by most people who dont work in very large projects with interconnected dependent libraries.

andreialecu commented 3 years ago

Hey guys. Just wanted to mention that a nodejs package manager: Yarn 2 (codenamed berry) might be worth looking into.

It has a plugin-centric architecture, and there has already been experimentation for making it work with C/C++: https://github.com/yarnpkg/berry/pull/1697

Might be pretty easy to get an MVP ziglang package-manager quickly going, at least as an interim solution.

mcandre commented 3 years ago

Honestly, an official, language-wide standard for managing dependencies would be a big win over C/C++ and their hodgepodge of warring factions.

I request that a section be devoted to development-time dependencies, such as linters and test frameworks, and an equivalent to bundle exec / npm bin for managing Zig utilties on a per-project basis.

nektro commented 3 years ago

https://github.com/nektro/zigmod

annymosse commented 3 years ago

@andrewchambers

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I would like to remind you about the deps node_modules hell that made by npm(yes there's some solutions to eliminate that hell however why do we need to roaming around of it instead pass it on beginning?) and the best solution is made by go & deno.

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

Same first note + how about my private libs ? should we pay for private repos ??!!!; i suggest use the deno & go solution to save disk-space eliminate dependencies folder such as node_modules and save data bandwidth for clients and the repository server (less traffic & high-availability).

Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.

Why there's something called Semver? isn't it to remove the deps compatibility hell ?; as a coder it is so easy to me to know when should i upgrade my lib or not only by reading the x.y.z (BreakingChanges.Features.PatchBugs) without read the change log at all.

Additional ideas :

Thoughts that might be so horrible for Zig:

ghost commented 3 years ago

I think zig can already do "dependency injection" for your use case: Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

This doesn't work when you want one package to be able to access another one. Like if you have a "logging" package and "auth" package, auth package cannot access the logging package, because the addPackagePath resolution only applies to the root source file.

(I am using a workaround in one of my projects: instead of using addPackagePath in build.zig, have the main (root) file import specific implementations of each "package" and make them pub, and then having other files access them via @import("root"). But this doesn't work in tests.)

akavel commented 3 years ago

Hm; @CantrellD's comment, and @419928194516's comment it builds upon, reminded me somewhat of the approaches explored in:

As such, I'm wondering if it could make some sense to explore some similar API<->implementation decoupling in the package manager. Notably, if yes, I would imagine:

Just some ideas that came to my mind after reading a huge chunk of the thread above (though I can't 100% guarantee if I managed to internalize all of it). I'm not even sure if that's at all doable, notwithstanding whether this should be done at all; but I'm interpreting that this is still mostly a brainstorm phase, so throwing in my brainstorm contribution. Cheers and good luck!

dralley commented 3 years ago

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

For managing C dependencies (and maybe Zig dependencies too, eventually?) it might be useful to be able to request the system-provided version of a given library. At least on Linux.

Let's say you have a library that uses Curl and OpenSSL and you want to link it against the system-provided versions of those libraries. It'd be nice if the Zig package manager could be configured to skip downloading them if a compatible version of the development library is already installed on your system in the traditional way e.g. dnf install openssl-devel libcurl-devel, and to make sure that those copies are used.

Of course, that can get tricky with the differences between families (Debian, Fedora, Arch) and within families (Debian/Ubuntu, Fedora/RHEL/CentOS). You might need to list the package names for a couple of different distros and have some way of constraining the acceptable versions. I can see how the cost/benefit could be questioned, it just strikes me as an interesting idea, especially since Zig is so nicely suited to being used with dynamic linking.

faraazahmad commented 3 years ago

What are your thoughts on PNPM? It makes sure (among other things) that the same package (with same version) only has 1 copy of it in the system instead of the same package being used in multiple projects on the system in multiple node_modules folders.

While this is great for JS, I wonder if this is possible for a compiled language. (Apologies in advance if it has been mentioned before in this thread I couldn't go through all of it)

jayschwa commented 3 years ago

Go announced a security vulnerability today regarding malicious code execution when running go get: https://blog.golang.org/path-security

This is something to keep in mind for the package manager, and perhaps more broadly, the build system.

andrewrk commented 3 years ago

Yep that's one of the main reasons build scripts use a declarative API. Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things. Packages which use the standard build API and do not try to run arbitrary code would be eligible for this kind of security, making it practical to make it a permissions flag you'd have to enable: to allow native code execution from a given dependency's build script.

Until we have such an advanced feature, however, it should be known that zig build is running arbitrary code from the dependency tree's build.zig scripts. And it is planned for build.zig logic to affect package management. However it is also planned that the set of possible dependencies will be purely declarative, so it will be practical to have a "fetch" step of package management that does not execute arbitrary code. However however, it is also planned for the package manager to support "plugin packages" that enable fetching content from new and exotic places, for example ipfs. For this feature to work, again, relies on execution of arbitrary code.

Ultimately I think what we will end up with is that some projects will want to execute arbitrary code as a part of the build script, but for most packages it will not be necessary, so that it can be no big deal to explicitly allow an "arbitrary code execution flag" on those dependencies that need it.

daurnimator commented 3 years ago

Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things.

What if build.zig had to work completely at comptime? That way it'd already be sandboxed to whatever is possible at comptime.... (i.e. side-effect free)

antartica commented 3 years ago

Is it really neccessary to have the dependencies inside the build.zig? Wouldn't be better to have them in a separate dependencies.zig?

I would expect building a package to be an step unrelated to downloading the sources and its dependencies.

That is, similarly to what one does when using sources to generate binary packages in a distribution. For example in debian, after you have downloaded some debian-aware sources for a program, you use "dpkg-checkbuilddeps" to check if the dependencies are satisfied, and "apt-get build-dep" to request the download of the dependencies. And then you can start the build/test/fix/rebuild cycle (be it with make, fakeroot debian/rules binary, dpkg-buildpackage or debuild).

I say this because it would make me nervous that doing a compilation could download and update a dependency (which would be bad if there are issues I want replicate and debug, as the production code and my code would be using different versions of a dependency).

Note that I say this as an outsider that has recently discovered Zig and is evaluating doing some project with it, so I may be missing something or misunderstood the proposal.

Meai commented 3 years ago

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

Requiring people who publish packages to own the domain they publish to is a basic security check that is worth it in my opinion. Say somebody publishes a zig package called "com.microsoft.uriparser". Seen this suggestion here as Maven apparently is doing: https://www.reddit.com/r/programming/comments/lhu44g/researcher_hacks_over_35_tech_firms_by_creating/gn11fwj?utm_source=share&utm_medium=web2x&context=3

kidandcat commented 3 years ago

If you want inspiration, I have not dug in how it works internally but Dart Pub https://dart.dev/guides/packages is the best package manager I have work with, and I work daily with npm, go, maven, pods and some others. It just works seamesly, I'm all the time switching flutter and dart versions and pub reinstall all dependencies so quickly with zero troubles everytime I switch between flutter channels.

https://dart.dev/tools/pub/dependencies

DrSensor commented 3 years ago

@kidandcat would you elaborate on why Dart Pub your best package manager compares to others?

ElectricCoffee commented 3 years ago

Maybe I'm just stupid, but wouldn't immutable packages make it really hard to push bug fixes and updates? Or would each version just be its own frozen entity?

nektro commented 3 years ago

the latter @ElectricCoffee

gphilipp commented 3 years ago

I really recommend this talk by Rich Hickey (the author of Clojure) on dependency management https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/Spec_ulation.md. You'll learn that semantic versioning is no panacea. Btw, Clojure has a tool called tools.deps which can resolves dependencies that use a git repository + a sha (see https://clojure.org/guides/deps_and_cli#_using_git_libraries).

jackdbd commented 3 years ago

Related to what @gphilipp wrote, Leiningen, the most popular build tool for Clojure, can be extended with plugins. One of these plugins takes care of versioning following the approach 1 git commit = 1 version https://github.com/roomkey/lein-v

Lein-v uses git metadata to build a unique, reproducible and meaningful version for every commit. Along the way, it adds useful metadata to your project and artifacts (jar and war files) to tie them back to a specific commit. Consequently, it helps ensure that you never release an irreproduceable artifact.

suptejas commented 3 years ago

A massive issue that we face with external libs however, are the fact that there's a rabbit-hole of dependencies that have to be added recursively. If you take 1 package that has only something like 2 dependencies. Each of those dependencies have further dependencies which have even more dependencies. As you might probably guess, this leads to a massive amount of dependencies being added to your project for importing 1 library.

We commonly see this issue when using rust crates. I would suggest that we make the user aware about the size and the implications of importing the library. It should be made known to the developer that they by importing the library, they are using x number of dependencies (not only the top level ones, all of them for the dependent packages too). This allows for developers to be more conscious. Additionally I highly recommend a metric that can be used to measure the negative impact or the size increase on the end-executable. I'm not exactly sure how this metric could be implemented but it's a suggestion for users who are conscious about their executable sizes and the cost of their imports.

sam0x17 commented 3 years ago

I would suggest that we make the user aware about the size and the implications of importing the library

Would be cool if the compiler spits out a warning if your package's dependency tree becomes more than n levels deep where n is at the border between reasonable and egregious

moosichu commented 3 years ago

Whilst I'm new to Zig and am learning it (so do take what I say with a grain of salt) - hopefully I can write some stuff down to add to this discussion that would be useful. I've skimmed all the comments on this issue - but apologies in advance if some of this stuff has already been covered.

So just to establish the basics - package managers can solve an important problem - have to track-down the dependencies of all the libraries (especially transitive ones) that you use & get them building easily. It's one of the biggest pain points in C/C++ and no one wants that.

However - they can introduce their own problems, which can be especially seen in the Rust/NodeJS ecosystems of "transitive micro-dependency hell" - which is where libraries are rarely standalone, and if you think you might be using something to solve a simple problem, you are in fact pulling loads of code (unawares) that is a massive surface vector for potential errors and issues that not only do you have little understanding of, but the library writers that you are grabbing directly might have little understanding of due to the large transitive network of packages. Whilst this is arguably a cultural issue as opposed to a technical one - the precedents set by technology can affect this.

It's an extreme version of what XKCD highlights. And can cause very real problems. It's worth pointing out that issues like this even effect the Rust compiler - where the self-hosted compiler itself now depends on some pretty deep transitive dependencies hierarchies iirc (it's something I've been told by a friend who contributes to the rust compiler told me - but I could be wrong on that). EDIT: just opening up a few of the Cargo.toml files in the repo for the rust compiler itself reveals dependencies on external dependencies. Even the driver which powers the update loop of the main compiler depends on a micro 200-line package that is still somehow not even version 1.0 yet.

It's also worth pointing out that there are fundamentally two different problems as well to deal with - package dependencies for libraries vs package dependencies for standalone projects (at end of the stream). With the former you need to deal with that fact that you don't know what context your library will be used-in, so having flexibility in your dependencies so you can deal with other instantiated libraries having mutual dependencies is needed. But in the latter case you do have full context, and being able to consistently and reliably build a standalone project is so important - I want to (at least as an option - but I think it should be the default, especially if package hosting is federated) be checking-in all of my packages to my repository by default. I want to know that even in 50 years, if I have the right version of the zig compiler and a copy of my source tree - that my project will build without needing to grab any dependencies from the internet. Sometimes you might even have different dependencies for different branches of a project (because often package upgrades will be implemented and tested in isolation from the main development branch) - and if you are switching back and forth a lot - triggering additional downloads from the internet each time is also no fun.

However, zig build could shout at me if I have an outdated version of a dependency that has recently received a minor update for example, and that could be an error by default to encourage the installation of security patches for example.

Another thing worth considering, is that especially in performance-sensitive contexts like games, library modification/customisation can be become a shipping necessity (unless a library has been really well designed. Therefore you might want to create custom forks of packages with your changes, but you might still want to know when the upstream package has been updated (so you know to update your fork).

As a library writer, I might want to make certain dependencies optional (especially if I'm designing it as a progressive api) - for example, by default if I'm writing some kind of graphical library, the default high-level API might require some kind of graphics back-end package as a dependency to allow it to be easily used and tested. But the lower-level API might allow the end-user to provide their own back-end, therefore allowing that dependency to removed. It's a minor thing, but thought it would be a use case worth mentioning.

The only thing that I can't stress enough is how important it is to allow (or even default to!) downstream end-users to checkin all of their dependencies into their version control. It's something I've had to write tools for in the past because otherwise you end-up in situations where servers can be offline and then someone ends up invalidating their cache somehow and all of a sudden they are unable to work until the dependency servers are back-up. I've been in situations where engineering effort has had to be expended on implementing features like this for package managers which didn't support this just because of how much of a problem that unreliability could become.

P.S. In fact, allowing arbitrary build artefacts to be put somewhere they can be checked-in to source control could be useful. eg. you might have a C++ library that you are building from source, but you know it is rarely going to change and each rebuild of it takes a good amount of time. Being able to declarative say "I want this compiled artefact to be checked-in" and have it put in a different folder could be useful - and not require the infrastructure of a shared networked build cache. I often do that anyway with third party libraries that I have built from source.

EDIT: Also something else that needs to work simply is cross compilation support. I imagine that it would be great to have an ecosystem of "wrapper packages" that simply wrap C/C++ projects but with everything setup so that they respect zig cross-compilation targets. In some cases this might have to involve linking against the correct pre-built binary libs.