warpfork commented 5 years ago

Facilitating the Correct Abstractions

(Acknowledging, of course, the hubris of the title -- we can only hope and try!)

To contribute meaningfully to advancing the state of package management, we must first understand package management.

First in understanding package management, we should identify and understand the stages of package management. These are stages I would identify:

[human authorship phase is ready to produce a package]
pack content
write release metadata (version name, etc)
upload content and release metadata
[-- switch between producer to consumer --]
fetch release metadata
transitive dependency resolution
lockfile creation
[-- possible switch to even further downstream consumer --]
lockfile read and content fetch
content unpack
[a new human authorship phase takes over!]

cycle

(Image is an earlier visualization of roughly the same concepts, but pictured with authorship cycle closed. Also note this image contains an "install" phase which is elided in the list above or perhaps equates to "content unpack" depending on your POV; and several other steps were combined rather than enumerated clearly.)

Understanding these phases of package management, we can begin to identify what might be the key concepts of APIs that haul data between each of the steps. And understanding what key concepts and data need to be hauled between each step gives us a roadmap to how IPFS/IPLD can help haul that data!

Now. There's many interesting things in the above:

Never forget that rather than a list, there is actually a cycle when creation gets involved. I won't talk about this more in this issue, but in the longest runs, it's incredibly important to mind how we can close this loop.
Some of these phases are particularly clear in how they can relate to IPFS! For example, uploading of packages and fetching of packages: clearly, these operations can benefit from IPFS by treating it as a simple content bucket that happens to be be particularly well decentralized. Since this is already clear, I also won't talk any more about this in this issue.
You might have noticed I injected some Opinions into a few of the steps. In particular, that ordering of transitive resolution vs lockfile creation vs metadata fetch is not entirely universally adopted! Some systems skip the lockfile concept entirely and re-do dependency resolve every time they're used! Some systems vary in what the lockfile contains (version numbers that are still technically somewhat vague and need centralized/online translation into content, versus content-identifiers/hashes, etc). Of course, systems vary wildly in terms of what information they actually act on and what exact logic they use for transitive dependency resolution. And alarmingly, most systems don't clearly separate metadata fetch from resolution processes at all.

That last set of things I really want to focus in on.

I think my biggest takeaway by far from the last couple years of thinking about this whole domain is that segmenting resolve from all other operations is absolutely Of The Essence. It's the point that never ceases to be contended, and for fundamental rather than incidental reasons: it is correct for different situations and packagers and user stories to use different resolution strategies.

It's also (what a coincidence) the key API concept that lets IPFS help other systems while keeping clear boundaries that let them get on with whatever locally contendable (e.g language specific) logic they need to.

But here we've got a bummer. Essentially no modern package managers I can think of actually intentionally designed their resolve stages to be separate and pluggable.

The more we encourage separation of resolve from the steps that follow it, the more clear it becomes for every system to have lockfiles; and the more things have lockfiles, the happier we are, because the jump from lockfile to content-addressable distribution system gets more incremental and becomes more obviously a right choice. But this is already widely clear and quite popular!

More interesting is what happens when we encourage separation of resolve from the steps that precede it -- namely, from "metadata fetch".

If we can encourage a world of package managers which have clearly delineated boundaries between metadata fetch and the evaluation of transitive dependency resolution upon that metadata, we both get clearer points for integration IPFS/IPLD in the metadata distribution, AND we provide a huge boost to enabling "reproducible resolve" -- an issue I've written more about here in the Timeless Stack docs -- which sets up the whole world nicely for bigger and better ecosystems of reproducible builds.

Thank you for coming to my Github issue / thinkpiece.

Where can we go from here? No idea: I just want to put all these thoughts out there to cook. We'll probably want to consider these in roadmapping anything beyond the most shortterm basic content-bucket integrations; and perhaps start circulating concepts like separating resolve from metadata transport sooner rather than later to prepare the ground for future work in that direction.

andrew commented 5 years ago

I think that quite nicely fits with a feeling of that "executable package manifests" (setup.py for example, where dependencies of a given package are calculated by executing some code, which may conditionally decide "you're using python 2.x, so you get an extra dependency) are bad.

If you can’t get the metadata for a package, without first downloading and executing the package (or part of it), separating the resolve from the metadata collection is going to be tricky.

mikeal commented 5 years ago

You might have noticed I injected some Opinions into a few of the steps. In particular, that ordering of transitive resolution vs lockfile creation vs metadata fetch is not entirely universally adopted!You might have noticed I injected some Opinions into a few of the steps.

This got me thinking a bit. Maybe we should be injecting some opinions into package management. Maybe the value-add we offer is tied up in some of those opinions.

For instance, instead of thinking we are better aligned with package managers with some immutability guarantees and locking mechanisms, maybe we should focus on adding those kinds of features to the package systems that lack them. We’re already going to get an immutable reference when we convert the package to IPFS, why not lean into that as where we build our added value to package management.

lanzafame commented 5 years ago

Maybe we should be injecting some opinions into package management.

I know lots of people that don't like opinions...

Partly joke, partly serious. I have found package management to be one of those areas that are still prone to flame wars. I guess this is sliding into more the marketing side of things but I guess if you could sell most people on IPFS making checksums useful without relying on users to actually use them. I dunno, my 2cents.

andrew commented 5 years ago

Related: Yarn v2 looks like it provides a nice interface to inject IPFS support into, the built-in http, npm and GitHub plugins do separate the resolver and the fetcher to a certain extent.

Found via: https://github.com/yarnpkg/yarn/issues/6953

We'll add support for plugins, which will be able to alter various things - from adding new commands to hooking into the resolution / fetching / linking steps to add support for new package sources or install targets.

Related to the plugin system, Yarn will become an API as much as a CLI. You can expect to be able to require it and start using its components in your script - no need to parse your package.json anymore, no need to run the resolution .. Yarn will abstract all those tedious tasks away.

warpfork commented 5 years ago

why not lean into that as where we build our added value

:heavy_plus_sign: :100:

Speaking from a POV that's closer to distro-scale/style packaging, I can also say that having things move closer to un-networked resolve will also be a massive (massive, massive) source of interest and excitement for distro people. Distros generally have a different take on versioning and version selection than the more language-centric package management tools tend to... and getting resolve stuff more clearly delineated in package managers we interact with has the potential to open up tons of avenues for closing gaps and building more bridges between the language-centric and distro-style approaches. Which could overall save tons of work for many different communities.

andrew commented 5 years ago

Both CocoaPods and Bundler have a reasonable separation of metadata collection and resolve (although interestingly rubygems itself does not, Bundler avoids using gem install under the hood for that reason) and they also both share a resolver: https://github.com/CocoaPods/Molinillo

All of the pms in the "Portable Registry" have a reasonable separation between collection and resolve (they literally download a copy of all the metadata in the registry), which is likely also why it appears to be easier to add IPFS support to them.

andrew commented 5 years ago

Going to be moving the content of this issue into the /docs in the repository in https://github.com/ipfs/package-managers/pull/53

ipfs-inactive / package-managers

Facilitating the Correct Abstractions #16

Facilitating the Correct Abstractions