ipfs-inactive / package-managers

[ARCHIVED] 📦 IPFS Package Managers Task Force
MIT License
99 stars 11 forks source link

What would decentralized publishing on IPFS for npm actually look like? #58

Open andrew opened 5 years ago

andrew commented 5 years ago

There was some talk of decentralized publishing for npm-on-ipfs on the IPFS GUI call yesterday as a potentially interesting area to explore.

After exploring some decentralized package manager ideas I thought I'd try to apply some of them to an npm implementation.

Assuming ipfs client support can be added to npm or yarn and that IPNS isn't slow (or DNSLink is used), how does one go about publishing a package directly to IPFS and then having someone else consume it?

One key aim here is to not require hosted infrastructure, can we just use standard IPFS tooling and keep the logic in the clients?

Also of note, none of these solve the discovery problem of "where do I find the decentrally published packages", which likely would be an opt-in search engine where publishing tools announce the cid of new published packages to be indexed.

Level 1

The simplest option would be to add the a tarball directly to ipfs:

ipfs add big-number-1.1.0.tgz

Then any user could specify it as a dependency in package.json:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "ipfs://Qmfoo",
  }
}

Downsides to this:

Level 2

To add an update mechanism to level one, you could use ipns:

ipfs name publish $(ipfs add big-number-1.1.0.tgz)

Then any user could specify it as a dependency in package.json:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "ipns://Qmfoo",
  }
}

Then when publishing a new version, just update ipns to point to the new one:

ipfs name publish $(ipfs add big-number-1.1.1.tgz)

Downsides to this:

Level 3

To allow end users to control which version they are installing, publish a modified index file (eg bignumber) along side the tarball.

Rather than adding a new field cid, like npm-on-ipfs does, instead make the tarball field the cid of that versions tarball:

"dist": {
  "shasum": "e6ab0a743da5f3ea018e5c17597d121f7868c159",
  "tarball": "ipfs://bafybeihmelfwcg664jeznipvrx2qzc6acrme5ztea2r4rljdqzrj72bx6u"
}

As in level 2, using an IPNS name for the index will allow for an update mechanism:

ipfs name publish $(ipfs add bignumber)

Then any user could specify it as a dependency in package.json:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "ipns://Qmfoo^1.1.0",
  }
}

Missing parts:

Downsides:

Level 4

Publishing a small registry directly to IPFS.

npm-on-ipfs has already shown how you can setup an npm registry on ipfs, by publishing multiple versions indexes (packuments) and adding them to a folder (or mfs), an IPFS address can then be used as a registry rather than a the location for one particular package.

small registries could be collections of useful modules for individual personal use, but the most likely use case is per-project registries that contain all the dependencies for that application.

Command to point your local npm client at an ipfs registry

npm config set registry ipns://QmT8wUy9CV3FHbjQeDG49ZL7at24u5fynBrkjum4MznNA7

The setting can also be added to a .npmrc file which can be commited in a repository so that all users of that project use the same registry:

registry=ipns://QmT8wUy9CV3FHbjQeDG49ZL7at24u5fynBrkjum4MznNA7

Then packages can be referenced in your package json using regular names:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "^1.1.0",
  }
}

This has the added benefit that all sub-dependencies will also be pulled from IPFS automatically, but will require the publisher of that registry to ensure they have added all required versions of every sub-dependency to that IPFS folder before using it.

Missing parts:

Downsides:

Note:

andrew commented 5 years ago

I was going to open an issue on ipfs-desktop about the possibility of bundling a fork of npm with ipfs support added via pacote that would enable some of these ideas via the flick of a switch in IPFS desktop.

But thinking about it, npm-on-ipfs and the pacote patch come at the problem from two different angles and I suspect the current implementation of npm-on-ipfs would not directly benefit from the pacote patch being merged (or available in a fork).

This is because npm-on-ipfs acts as a transparent IPFS proxy for registry.npmjs.org, allowing individual users to opt-in to loading data from registry.npmjs.org over IPFS. It doesn't force other contributors to the same project to also use npm-on-ipfs.

The pacote patch on the other hand does force any other contributors to use IPFS as well, take the following package.json for example:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "ipns://Qmfoo^1.1.0",
  }
}

ipns://Qmfoo doesn't have an equivalent http address, so the only way to resolve the address of big-number is to load it over IPFS.

I'd like to propose a slightly different approach inspired by ipfs-companion.

ipfs-companion-lib (or http-or-ipfs)

ipfs-companion the browser extension does a neat thing where if it see's the browser loading a http url where the query path starts with /ipfs/ or /ipns/ (for example: https://ipfs.io/ipfs/Qmc5gCcjYypU7y28oCALwfSvxCBskLuPKWpK4qpterKC7z) then it makes the request using IPFS rather than http, it'll also detect DNSLinks on the domain name and x-ipfs-path http headers: https://github.com/ipfs-shipyard/ipfs-companion#automagical-detection-of-ipfs-resources

It also only upgrades requests if it detects that the user has an active IPFS node, either via an embedded one, IPFS desktop or a go-ipfs daemon.

If there was a low level javascript http library (perhaps called ipfs-companion-lib or http-or-ipfs) that had a similar ability to "upgrade" certain http requests to IPFS using a similar set of conditions to companion.

This would then enable users with IPFS support to enjoy loading modules from their neighbors without directly adding IPFS hashes to their package.json files. I suspect DNSLink or x-ipfs-path would be very useful here, allowing a registry mirror/proxy to host packuments and tarballs on IPFS as well as providing a http fallback without localhost:8080 ending up in package-lock.json files (one of the things npm-on-ipfs has to manually rewrite after installation).

It doesn't look like registry.js.ipfs.io currently publishes a DNSlink record but that'd be a good thing to add. Then you could set registry.js.ipfs.io as your config in .npmrc and any http requests to that registry would be upgraded to IPFS for users running a node locally whilst continuing to work over http for other users.

Eventually the hope would be that registry.npmjs.org would also have a DNSLink and registry.js.ipfs.io could be retired.

It'd also mean that for self-published packages, rather than using the pacote patch style ipns://Qmfoo dependencies, the ipfs gateway address could be used, which will work for both existing http clients and users who choose to use IPFS:

{
  "name": "my-module",
  "dependencies": {
    "big-number": "https://ipfs.io/ipns/Qmfoo^1.1.0",
  }
}

One other thing to note here, currently the npm-on-ipfs infrastructure uses js-ipfs and does not connect to the same DHT as go-ipfs, so trying to load tarballs and packuments through the ipfs gateway (http://ipfs.io/ipfs/bafybeihmelfwcg664jeznipvrx2qzc6acrme5ztea2r4rljdqzrj72bx6u) doesn't find anything.

achingbrain commented 5 years ago

There are quite a few things to unpack here..

(Missing parts) A tool for generating an index file (packument) locally for one or more versions of package

This should be trivial to implement - I can't find a detailed spec for packuments but there's an overview on the packote repo.

Referencing an ipns hash in package.json isn't very human readable

True, but you can have more memorable strings via DNSLink and it's better than having them in the source code. If we get registry.js.ipfs.io to publish DNSLink TXT records it'll make this a bit more bearable.

Not sure how it would look though, maybe:

  "express": "ipns://express.npm.registry.js.ipfs.io#^4.0.0",

Also more unsure of how we'd deal with scoped packages (e.g. @hapi/hapi)?

  "hapi": "ipns://hapi-hapi.github.registry.js.ipfs.io#^18.0.0"

Referencing an ipns hash in package.json provides no information about who the publisher of it is

This is definitely a concern but one I hope we can solve outside of package managers. For example, if the CID an IPNS name resolved to was an intermediate dag-cbor node like:

{
  "cid": "QMfoo...",
  "author": "MIIBCgKCAQEA6...",
  "signature": "ZGFzZHNsa2Rqc2Z..."
}

Where author is a public key, signature is the CID encrypted by the private key the public key corresponds to. Verify the public key of the person you trust out-of-band, then decrypt the signature using the public key and it should match the cid. Something like that, though IANASE (I am not a security engineer).

By default dependencies will still be installed from npmjs.org over http, unless someone goes through and manually republishes all dependencies with their dependencies declared as ipns hashes as well

I think this is ok. We already pull deps from npmjs.org, github, arbitrary file:// & http:// URLs, and it all seems to work.

I'm not sure if @achingbrain's PR to npm would work with pointing whole registries at ipfs addresses, needs further investigation

Not as it stands - it's more aimed at the Level 3 use case above.

My gut feeling is that Level 4 adds a significant amount of complexity - it'd probably be good for teams working on large apps but may be overkill for small projects/libs.

The IPFS Companion approach is interesting - the x-ipfs-path seems to be the only reliable way to detect IPFS versions of the thing you're requesting, parsing it out of URLs seems very fragile. Until npm/github add x-ipfs-path headers you'd depend on our centralised infrastructure which I don't think we should rely on.

You've also made enough of an HTTP request to get the x-ipfs-path header in the response so at that point it feels like you should just let the HTTP request run to completion (or would it just do a HEAD on the resource?).

currently the npm-on-ipfs infrastructure uses js-ipfs and does not connect to the same DHT as go-ipfs

Or any DHT for that matter. It's coming in 0.36.0 though - if it doesn't we can convert registry.js.ipfs.io to use go-ipfs under the hood, though it's been a very useful vector for performance testing js-ipfs up until now.

If there was a low level javascript http library (perhaps called ipfs-companion-lib or http-or-ipfs) that had a similar ability to "upgrade" certain http requests to IPFS using a similar set of conditions to companion.

How do you see this working? We could monkey-patch node internals but you'd have to require it somehow. Also it seems a bit weird to co-opt the http:// protocol if that's what the user has specified.


Ermergerd there's a lot here, sorry for the stream of consciousness.

The way I see npm-on-ipfs working in future is something close to Level 3:

andrew commented 5 years ago

I'm been doing some real world testing with this today, although instead of using @achingbrain's pacote patch, I just used the local http gateway provided by the ipfs daemon.

Level 1

This works as expected:

$ npm pack my_module
$ ipfs add my_module-1.0.0.tgz

Install it:

$ npm install http://localhost:8080/ipfs/QmeuiRzbjhnyghoADPCrmPpQYgLj4eG94TWVcedXp6S2mg

It results in the local gateway url to the tarball being added to package.json as the version:

{
  "name": "testing",
  "dependencies": {
    "base62": "http://localhost:8080/ipfs/QmeuiRzbjhnyghoADPCrmPpQYgLj4eG94TWVcedXp6S2mg"
  }
}

and a similar package-lock.json file:

{
  "name": "testing",
  "requires": true,
  "lockfileVersion": 1,
  "dependencies": {
    "base62": {
      "version": "http://localhost:8080/ipfs/QmeuiRzbjhnyghoADPCrmPpQYgLj4eG94TWVcedXp6S2mg",
      "integrity": "sha512-YtkASiZCn90Bd78+tLHSSxtrUekRLUchUhtPCp46vONCJEHJrXOJc97x6Ipdfos0xKIw/9fVBo1adBV5MAkhEA=="
    }
  }
}

Level 2

Similar to level 1 but with an ipns name.

Generate a tarball and add it to IPFS:

$ npm pack my_module
$ ipfs add my_module-1.0.0.tgz

Then publish it under an ipns:

$ ipfs name publish QmeuiRzbjhnyghoADPCrmPpQYgLj4eG94TWVcedXp6S2mg

Install it:

$ npm install http://localhost:8080/ipns/QmeuiRzbjhnyghoADPCrmPpQYgLj4eG94TWVcedXp6S2mg

I couldn't get that last stage to complete, it timed out every attempt, ipns is still too slow to recommend anyone use it. That said, I expect to get the same results as with Level 1 just with a slightly different url.

This also has the existing downside of not allowing users to control which version they are installing and will cause integrity checks with lockfiles, so not a good idea to use for demos either.

Level 3

Turns out this does not work at all, although not because of IPFS but a misunderstanding in how npm installation arguments work.

The theory was that you could publish a "packument" to IPFS and then pass the address to npm along with a version and it would just work.

Testing with a regular packument from npmjs.org over http reveals the problem:

$ npm install http://registry.npmjs.com/bignumber
npm ERR! code ENOPACKAGEJSON
npm ERR! package.json Non-registry package missing package.json: http://registry.npmjs.com/bignumber.
npm ERR! package.json npm can't find a package.json file in your current directory.

If you pass anything that looks like a url (or any other address that pacote already supports like git://) then npm is expecting to find a single folder of a release with a package.json in the root.

In other words, you can't pass a packument to npm install, which invalidates Level 3 entirely 🙈

Level 3.1 (previously level 4)

So if level 3 doesn't work, what does?

As far as I can tell the only way to pass a packument tonpm install is with the --registry flag:

npm install bignumber@1.1.0 --registry=https://registry.js.ipfs.io

One catch at the moment is that the --registry option has to use http(s):

$ npm install bignumber --registry=/Users/andrewnesbitt/code/testing/bignumber
npm WARN invalid config registry="/Users/andrewnesbitt/code/testing/bignumber"
npm WARN invalid config Must be a full url with 'http://'
npm ERR! Only HTTP(S) protocols are supported

fun fact: npm uses a different module for making registry requests than pacote: npm-registry-fetch, so @achingbrain's pacote patch won't automatically add ipfs support here too

But we can use the local http gateway provided by ipfs 🎉

The command npm install bignumber@1.1.0 --registry=https://registry.js.ipfs.io ends contructing a packument url like https://registry.js.ipfs.io/bignumber in which it then searches for version 1.1.0.

If we take the packument from https://registry.js.ipfs.io/bignumber and drop it in an empty folder called my_registry:

wget -P my_registry https://registry.js.ipfs.io/bignumber

Then add that folder to ipfs:

ipfs add -r my_registry

We can then use the CID folder as the root registry over the local gateway:

$ npm install bignumber@1.1.0 --registry=http://localhost:8080/ipfs/Qmds3SRV8ABBN6zgQAqPZxHLZDGcURJHYyAZ1AGcNvmsPx

+ bignumber@1.1.0
updated 1 package in 0.185s

Which results in the following package.json:

{
  "name": "testing",
  "dependencies": {
    "bignumber": "^1.1.0"
  }
}

and package-lock.json:

{
  "name": "testing",
  "requires": true,
  "lockfileVersion": 1,
  "dependencies": {
    "bignumber": {
      "version": "1.1.0",
      "resolved": "https://registry.js.ipfs.io/bignumber/-/bignumber-1.1.0.tgz",
      "integrity": "sha1-5qsKdD2l8+oBjlwXWX0SH3howVk="
    }
  }
}

But there's an issue there, the resolved url in package.json is a http url to registry.js.ipfs.io, which won't use ipfs to load at all.

Let's update the packument to point to the actual tarball hash on ipfs via the local gateway and readd that to ipfs:

{
   "_id":"bignumber",
   "name":"bignumber",
   // ...
   "versions":{
      "1.1.0":{
         "name":"bignumber",
         "version":"1.1.0",
         // ...
         "dist":{
            "shasum":"e6ab0a743da5f3ea018e5c17597d121f7868c159",
            "tarball":"http://localhost:8080/ipfs/QmeERhULUu7nwgLvxFrnnvJViKqwUxpmswWKzg41ensSYk"
         }
      }
   },
  // ... 
}

The updated package-lock.json:

{
  "name": "testing",
  "requires": true,
  "lockfileVersion": 1,
  "dependencies": {
    "bignumber": {
      "version": "1.1.0",
      "resolved": "http://localhost:8080/ipfs/QmeERhULUu7nwgLvxFrnnvJViKqwUxpmswWKzg41ensSYk",
      "integrity": "sha1-5qsKdD2l8+oBjlwXWX0SH3howVk="
    }
  }
}

You can now turn off your wifi and successfully run the installation command again and it "just works™️" 🎂

This would be a good place to use ipns, publishing a name for the folder means you can publish new versions (add another version to the packument) or add new packuments to the folder, re-add the whole folder and update the ipns.

This is very similar to how npm-on-ipfs actually works, except uses the local gateway rather than running it's own web server proxy.

In theory if you connect to the npm-on-ipfs swarm and fetch the root of the mfs from https://registry.js.ipfs.io/ you can load packuments from there over the local gateway too:

$ ipfs swarm connect /dns4/registry.js.ipfs.io/tcp/10042/ipfs/QmfKqxieE71QoAchNk5e2MKmvWKjGdUnSifHqq1xZLEzyn
$ npm install bignumber@1.1.0 --registry=http://localhost:8080/ipfs/QmNfmaA4K9G6yWYxJoRhPiYjJvYXMjPMkuuazqGahg2r4S

There's a couple ugly bits when it comes to this being more widely used:

There also needs to be a tool to generate a packument as writing one by hand is very error prone!

extra ipfs-npm commands

Some potential commands that could be built into the ipfs-npm cli:

$ ipfs-npm packument command that generates a packument for a package that's published on ipfs

$ ipfs-npm publish command that publishes to IPFS (npm pack, ipfs-npm packument, put them both in a folder, ipfs add -r and optionally ipns/dnslink name publish)

And a couple extra commands that I thought of whilst experimenting:

$ ipfs-npm alias command that installs it as the default npm in your shell

$ ipfs-npm pin $package command that pins packuments and tarballs for packages you want to keep locally

achingbrain commented 5 years ago

Some thoughts:

$ ipfs-npm publish

This will conflict with npm publish, since at the moment everything following ipfs-npm is passed to npm/yarn. Maybe we could have different commands aliased by the ipfs-npm module? E.g. ipfs-npm-publish or something a bit more memorable?

"tarball":"http://localhost:8080/ipfs/QmeERhULUu7nwgLvxFrnnvJViKqwUxpmswWKzg41ensSYk"

Having this sort of thing end up in your lock file doesn't bode well for repeatable builds as things will break if ports change or the daemon isn't running on the current machine, etc, which is why ipfs-npm rewrites them to be https://registry.js.ipfs.io/... URLs

Maybe if the pacote PR gets merged we can rewrite them to ipfs://Qmfoo URLs instead?

$ npm install bignumber@1.1.0 --registry=http://localhost:8080/ipfs/Qmds3SRV8ABBN6zgQAqPZxHLZDGcURJHYyAZ1AGcNvmsPx

I like the simplicity of using the local gateway as the registry but it means we can't rewrite the lock file post-install. Then again if the pacote PR gets merged, and the registry uses ipfs://Qmfoo URLs in the packument dist.tarball fields we might not need to rewrite it.

andrew commented 5 years ago

I thought I'd take a stab at making a script that implements a certain amount of Level 3.1 (previously Level 4), here's what I came up with: https://github.com/andrew/ipfs-npm-republish

First npm install it: $ npm install -g ipfs-npm-republish

Then change into a directory with a package-lock.json and run: $ ipfs-npm-republish

This will do the following steps:

1. List dependencies for current directory from package-lock.json
2. Calculate list of packages to be republished
3. create a tmp folder to act as ROOT
4. For each package
  1. Fetch packuments for each package and write to ROOT
  2. For each depended upon version:
    1. download the tarball to ROOT
    2. ipfs add tarball
    3. rewrite the dist.tarball url to a local gateway url with tarball hash
5. ipfs add -r ROOT
6. pin ROOT hash
7. set per-project npm config to use new micro-registry in .npmrc
8. output command to update registry to point to ipfs ROOT hash

The code probably isn't the best and it doesn't handle unexpected errors very well but when it works you do end up with a valid npm registry hosted on IPFS 🎉

Things it doesn't do:

It's also small steps towards the ideas in https://github.com/ipfs/package-managers/issues/52 around "portable packages", where you could use ipfs-npm-republish to create a micro-registry for a package and one for an application, then have a function that can merge those two micro-registries together, enabling someone to add the package to the application and generate a new package-lock.json without relying on a traditional registry or proxy at all.

andrew commented 5 years ago

Next step I'm going to look at doing is republishing for a package, which is subtly different than what ipfs-npm-republish currently does, which is to republish only the dependencies of an application.

In this case you'll be able to do: $ ipfs-npm-republish react and it'll do the following:

  1. Download the react packument from an upstream registry
  2. Locally run npm install react in a tmp dir to collect the transitive dependencies for that package
  3. Create a micro-registry for both react and it's full dependency list

Then you'll have everything you need to do a command line installation of react from that one micro-registry

andrew commented 5 years ago

As of v1.0.3 you can now republish individual modules directly from npmjs.org:

$ ipfs-npm-republish react@16.8.5
  Added object-assign
  Added loose-envify
  Added prop-types
  Added react-is
  Added react
  Added scheduler
  Added js-tokens

New registry url: http://localhost:8080/ipfs/QmadqAJ9rDUD7zdoyNc12gH4npVGLZN9V6fkybshEHanZz

Use it with the following command

  $ npm install react@16.8.5 --registry=http://localhost:8080/ipfs/QmadqAJ9rDUD7zdoyNc12gH4npVGLZN9V6fkybshEHanZz
andrew commented 5 years ago

As of v1.0.6 of ipfs-npm-republish you can now publish directly to IPFS without needing to first publish to npmjs.org.

Running ipfs-npm-republish publish from the root of a project directory will run npm pack and create a micro-registry for that newly created release along with it's runtime dependencies, which then works independently of npmjs.org even if it's dependencies originally came from there.

Aside from some performance issues on very large dependency trees, the last major area to cover is "update", which would in-effect take an existing IPFS micro-registry, check upstream for dependency updates and then publish a new micro-registry with the updated dependencies.

N.b. "update" in this sense is a little different to using IPNS for a micro-registry as it'll need to support packages from sources outside of IPFS, micro-registries published with IPNS should be auto-updating without any extra work.

This will require storing some more extra metadata in the packuments, which I'm planning on modelling on some of the ideas from https://github.com/ipfs/package-managers/issues/22, being able to follow the providence of the packages to get back to their original publish location, either on a traditional registry or hopefully the IPNS of the micro-registry they were published to.

andrew commented 5 years ago

v1.0.8 of ipfs-npm-republish is out, includes some significant refactoring:

I spoke to @olizilla yesterday about having IPFS desktop install npm-on-ipfs via a microregistry.

This would be pretty neat, IPFS Desktop wouldn't need to include ipfs-npm-republish as a runtime dependency, it can just store a CID (or eventually an IPNS name) to the npm-on-ipfs micro-registry created by ipfs-npm-republish.

There are a couple of bugs before that can work seemlessly:

Any help on those two bugs would be great!