nodejs / NG

Next Generation JavaScript IO Platform
103 stars 12 forks source link

Decentralized Module Resolving w/ proof of concept #29

Closed formula1 closed 6 years ago

formula1 commented 8 years ago

Last conversation here - https://github.com/nodejs/NG/issues/26 I believe I was part of the problem and I'd like to get the thread started on the right foot again.

I'm basing most of this off the TC39 process which has been an effective means of getting features included into the w3c standards. Link here -https://tc39.github.io/process-document/

Purpose

This is a culmination of efforts from @joepie91 @ChALkeR @mbostock @scriptjs @formula1

For More Detailed Explaination

Overview

Installation

Publishing

Enabling a Client to Have Multiple Registries

https://github.com/formula1/decentralized-package-resolving-example

git clone https://github.com/formula1/decentralized-package-resolving-example
npm install
npm test

Its unoptimized, much of it is synchronous. But hey, I'd like to believe I'm paying my time and effort forward so that if this thing gets through I will feel like I was part of the solution. Every journey begins with a step I suppose.

PInging those who seemed interested: @scriptjs, @mikeal, @Martii, @ChALkeR, @joshmanders, @jasnell, @ashleygwilliams, @Qard, @bnoordhuis

jasnell commented 8 years ago

@formula1 ... thank you for engaging with a concrete proposal. This is exactly the kind of thing I was hoping for. It might take me a couple days to dig into the details but I promise I'll take a look.

scriptjs commented 8 years ago

@formula1 I'll take a look. I have not have the time to write anything formal after https://github.com/nodejs/NG/issues/26 was closed yesterday (after this discussion went south a second time) but had made a commitment to @jasnell yesterday to do so. I have a prototype for resolving and fetching to endpoints at the moment with providers and resolvers using semver. I am studying ied's caching atm which is robust.

ChALkeR commented 8 years ago

@scriptjs @formula1

If you are proposing any new schemes, note that they should support moderation. Trust does not work like this, even if you trust someone, if malware infects his computer or his accounts somehow become compromised (ref: my post), that could be used to replace some packages with malware.

Copy-pasting myself from https://github.com/nodejs/NG/issues/26#issuecomment-174094994:

Note that moderation of the registry is currenly needed, because there could be harmful packages. Also there could be (theoretical) situations when the whole registry must be stopped for moderation (I could describe such a situation a bit later), and that should be achievable. I am not saying that this restriction must be absolute, though.

So, please either make sure that your proposals support moderation (not necessary by one party, but by a limited amount of parties all of which have adequate support timings) or show me (and everyone else) how that would not be an issue.

Qard commented 8 years ago

@othiym23 and I discussed a bit last night, the idea of a stripped down fetch + tar tool that wouldn't include any repository connection or dependency management. It'd just fetch a tar and unpack it, maybe running some very basic lifecycle scripts, like node-gyp, but probably without the configurability you have with npm. If you want more full-featured package management, you could just use the tool to fetch a more complete package manager.

https://twitter.com/stephenbelanger/status/691046744780599296

jasnell commented 8 years ago

Yep, I've been stewing in the same direction. It could be very minimal without the more advanced layout and dependency algorithms, and it wouldn't need much in the way of package.json smarts. Higher level package managers can be hooked in somehow to provide the higher level functions. I like it. On Jan 24, 2016 7:50 AM, "Stephen Belanger" notifications@github.com wrote:

@othiym23 https://github.com/othiym23 and I discussed a bit last night, the idea of a stripped down fetch + tar tool that wouldn't include any repository connection or dependency management. It'd just fetch a tar and unpack it, maybe running some very basic lifecycle scripts, like node-gyp, but probably without the configurability you have with npm. If you want more full-featured package management, you could just use the tool to fetch a more complete package manager.

https://twitter.com/stephenbelanger/status/691046744780599296

— Reply to this email directly or view it on GitHub https://github.com/nodejs/NG/issues/29#issuecomment-174311068.

scriptjs commented 8 years ago

I am in favour of this approach as well.

joepie91 commented 8 years ago

I've been contemplating a more decentralized implementation for NPM for the past few days. While I'd planned to keep thinking it over for a little while, it seems that now that the discussion is underway, I should probably share my ideas and how I believe they address some of the technical (and 'political') challenges that are posed by the aforementioned proposals.

There are still some 'gaps' in my proposal, given that I've not had enough time to think it over yet. Perhaps those can be filled in here.

The primary requirements

How NPM solves these problems right now

Where alternative solutions fail

My proposal

Core to my proposal is the splitting up of tasks:

The distribution of tarballs could use a mechanism like webseeds, pointing at one or more registry servers that serve tarballs (or even generate such a webseed list on the fly), to decrease the latency for downloading tarballs while still offloading part of the distribution load to other peers.

All metadata and tarballs are cryptographically signed by the 'write servers'. I want to emphasize that UX is very important here, and it cannot become any significantly harder for end users to use NPM as a result of architectural changes.

Furthermore, for the absolute worst-case scenario, it should be possible for individual end users to install from a Git repository using semantic versioning specifications, referring to the tags on the repository to find compatible versions.

Threat models

How my proposal addresses the requirements

Unsolved problems

scriptjs commented 8 years ago

From what I see, there are few topics to look at this holistically:

Minimal viable client in node

I think if we were to agree on the notion @jasnell came up with yesterday and @Qard reiterated here today, we can have something tangible here soon. Along the lines of solving the broader issue of distributed package management the rest can fall into place. Someone can start a repo. Most of this code is already available in some form. This is essentially what bower was for the most part. This resolves the optics of NPMs bundling in node, NPM as an upstream, and opens node up to developer choice. We should agree on what it will handle but it should be as little as possible. Enough to bring in a client and build it if necessary.

Full featured clients

Let developers choose their full featured client based on its capabilities and features. NPM is currently what most use today but there are viable clients up and coming that are showing promise and will be competitive this year. We don't need to solve this here, let the community do its thing and build software that offers these choices.

Vetting/moderation of modules for inclusion in a registry

This is a process. I think a community process could be established in conjunction with nodesecurity.io. This can be phased in with a process of working through existing packages. Currently, there is a lot of cruft and a large percentage of the current registry is never accessed.

A public module publishing process might work where a module could move through a community vetting process. Vetted modules could be held in a central registry by the node foundation. This central registry is not one used for everyone to fetch modules, but used as a source of updates to public registries. Something suitable like Rackspace Cloud or S3 could be used with an API by the node foundation to host the central registry.

Public registries

Public registries should adhere to some basic standards set out by the node foundation for trust. Such as they will only host vetted modules and must maintain their sources with updates by syncing with the central registry. A public registry should not need to be more than a source of static files, whether that is a CDN or peer that persists the data to seed it. It is possible for public registry to provide a static endpoint and peer simultaneously. Dat as an example of a peer to peer system requires a key for the data to access the content.

Let those interested in hosting registries freely host if they can meet these basic standards. The node foundations can create an icon or something to verify their status as a registry host.

Private registries

Here again let the community do its thing. Whether you want to use your own git repos, sinopia, commercial provider, etc.

Module discovery

Currently there is npm for searching modules, but I would encourage any company to use the source of modules to create innovative solutions for search and discovery. The volume of metadata is also not large for todays search software like elasticsearch etc to build something interesting.

package.json

The package.json is something that needs to be under community control for minimum standardized metadata properties. Formal proposals should need to come to make changes to this standard that are in the best interests of the community.

ChALkeR commented 8 years ago

@scriptjs Sorry, but your post again does not look like a technical proposal to me.

I doubt that manual verification of all modules before publishing them will work — who is going to do that? I doubt that «trust» and «standards» would work that way you suppose they would: either you should «trust» a lot of public registries, or your proposal wouldn't be distributed enough — you should, in fact, not trust them but sign the packages and make sure that those public registries could not meddle into packages content. I do not see how fast content moderation of already published modules would work in your system, if there would be a lot of those public registries. I also do not see how the emergency switch (aka «turn the whole thing down») would work in your model. I also don't like the idea of recommending multiple clients or giving the user an early choice there.

@joepie91 proposal (with splitting that into three groups — a central/replicated auth, a few read servers and various distribution nodes would work better, I think. Perhaps it would be good to add a possibility of using other methods of delivery as «distribution nodes» — e.g. tarballs from GitHub (given that those are signed by the auth «write» server), etc. The bad thing about directly installing unpublished versions from GitHub is that the tags on GitHub are not immutable, so auth server should store the information about which versions (tarballs) are signed.

Emergency switch could be introduced on the client-side, so that it would check the flag with auth/read servers (@joepie91 has a bit more to say about that), perhaps with an opt-out on the client, but that opt-out should give the user some grave warnings and require some non-trivial actions.

As for the usability — the auth server could offer GitHub auth (as one of the possible login methods) and get public keys directly from the user account on GitHub.

I also think that replacing the bundled npm with a «minimal viable client» would not be a great solution atm — it could introduce more problems than it solves. If we replace it with something, let's make sure that the new solution is superior.

Private registries and private registries hostings would be on their own, and that is fine enough.

scriptjs commented 8 years ago

@ChALkeR This is not a technical proposal, you are correct. It is a high level view of the elements of a system beginning with what is packaged with node. It frames responsibilities for a system so we can continue a technical discussion on the same page as to how we see this as a whole and who might be responsible for what.

I think it is well understood that registries need to be read only and that signed packages are a prerequisite. What we have today, however, is also a number of packages and cruft in NPM that should have never reached a public registry in the first place, yet is there and persisted. There needs to be some gatekeeping.

Here, there is the notion of a central authority for the modules. Publishing a module for first time could place it into a vetting/review queue. Vetted/trusted modules enter central registry that is operated by the Node Foundation. The function of the central registry is only for public registries to sync their content. Public registries offer read only access to the signed tarballs. Every public registry serves the same content.

ChALkeR commented 8 years ago

@scriptjs Are you saying that someone out there should manually review all the diffs between all the versions of all modules in that sparkling registry? That just sounds as an ideal approach, but there is no such amount of reviewers time available to achieve that.

joepie91 commented 8 years ago

Regarding the emergency switch, the sanest implementation of that would probably be for the write server to propagate a (signed) 'shutdown signal' to the read servers, and have the clients simply rely on the read servers to tell them whether they can install packages.

This shutdown signal could be used in cases where some kind of wide-spread security issue were to make package installations unsafe. Ideally, it'd never be needed.

I should note that the above is a result of @ChALkeR explaining to me the need to have a 'killswitch' - perhaps he could elaborate on the kind of scenario he is envisioning.


As for mutability of Git tags - I'm afraid that this is inevitable, and given that Git tags are only meant as a last-resort installation method, this could be an acceptable tradeoff.

Whereas I initially thought that adding a hash to the installation URI would resolve the matter, this would remove the possibility of using semantic versioning ranges, which I personally consider to be an essential part of NPM's packaging model, and necessary to make Git tags an 'equivalent approach' to regular dependency specifications.

Enterprises that need an absolute guarantee of immutability (ie. not able to trust the registry host either), already need to check in node_modules or use something like Sinopia inbetween, so I'd imagine that the loss of immutability guarantees on a small number of packages would not be a big problem.


@scriptjs I don't feel that a 'walled garden' would be a viable solution. Part of the reason why the Node.js ecosystem is so useful, is because everybody can publish modules. Reviewing all modules would introduce a significant delay, as well as manpower requirements (further increasing the dependency on the registry operator, which is precisely what we don't want).

A better solution would be to have two after-the-fact review queues - one that every package goes through to filter out the obviously malicious stuff (this may already exist), and one queue that people can request their modules to be placed into, for more in-depth review that can cover matters like security and code quality (as time is available). Neither of these queues need to be visible to the public, nor do they need to happen before package publication.

scriptjs commented 8 years ago

@ChALkeR No. I am not speaking of a form of detailed review here but something that would prevent cruft or malicious code from entering the registry. Currently there is zero barrier to publishing anything even if done in error.

I think at the very least there could be a scan of the package by an automated tool or an approach that would generate an issue where we let the community examine new modules that will entering the registry over some days.

This could work using a graduated program so that new authors will be slowed while trust is built. A module passes, the author/publisher earns some level of trust. We let trusted authors publish freely.

@joepie91 This is not meant to deter publishing by anyone at all. Only to be proactive rather than reactive.

joepie91 commented 8 years ago

This is not meant to deter publishing by anyone at all. Only to be proactive rather than reactive.

Regardless of whether it's meant to do that, it will do that. The core reason why NPM grows as quickly as it does, is that there are absolutely no barriers to publishing things. A delay in publishing modules will discourage people from publishing anything at all.

ashleygwilliams commented 8 years ago

this is absolutely true @joepie91. i would also like to point out the extremely negative effect this will have on the beginner experience. at a moment where Node.js is trying to improve the barrier to entry for new developers this would be an absolutely devastating move. surely some of it would be addressable by documentation, but that is something the Node.js is already struggling with.

i hear the concerns on this thread but don't see how this furthers any of the goals Node.js currently has, especially considering that there is no acute need for this to happen.

ChALkeR commented 8 years ago

@scriptjs

I am not speaking of a form of detailed review here but something that would prevent cruft or something malicious from entering the registry.

Sorry, but those are two mutually exclusive statements. For example, C will publish a package that exports a list of colors, let's say colors.js : module.exports = {red: '#f00', green: '#0f0', blue: '#00f'} and will call that as v1.0.0. This would not be cruft. That package will get moderated and approved. Now C publishes a new 1.0.1 version of that package and includes malware into it. Would your manual review notice that? And note that the users would be harmed even more, because once you say «we review packages» — they would expect packages being more secure, but that's not going to happen.

I think at the very least there could be a scan of the package by an automated tool or an approach that would generate an issue where we let the community examine new modules that will entering the registry over some days.

Automatic scans on the server hinting potentially dangereous packages would be good, yes. Delaying new modules (and module updates) by several days would be inacceptable.

This could work using a graduated program so that new authors will be slowed while trust is built. A module passes, the author/publisher earns some level of trust. We let trusted authors publish freely.

I am not convinced that this model would work. No one is going to review even all the new packages by unpopular/new authors. Also, «trust» is not absolute even for very popular packages — so the stuff in the registry would still be potentially insecure, and it would be bad if you will make it appear as if it is secure.

formula1 commented 8 years ago

I'm not entirely sure I am understanding this.

Questions

Other points

ChALkeR commented 8 years ago

@formula1

Centralized Authentication (@ChALkeR's suggestion)

It's not mine, I'm talking about this proposal by @joepie91:

Write server(s): One or more servers, controlled by a single entity, on a fully open-source software stack. These are the servers that actually authenticate users, accept package uploads, and decide what the registry looks like.

formula1 commented 8 years ago

@ChALkeR Ah, Just shows how confused I am! I see he put a lot of though into it but I'm having a hard time visualizing this.

Deleted the phone comment

Perhaps This is how i can understand it

Is this correct?

formula1 commented 8 years ago

Looking into this deeper

npm is far more awesome then I even realized. Thought it was good before

Torrent Tracking can be handled within node (node-gyp not necessary)

Git Server can be handled within node (node-gyp not necessary)

So from we have here are the basics necessary for a test example

npm publish

From the client Cli

From Registry

From the CLI - This I would like to change to where the distributor node connects directly to a git server and creates it independently. However, I am not sure if that is possible

npm install

From CLI

From Registry

From CLI

mbostock commented 8 years ago

I’ve been thinking about this a little bit and experimenting with a decentralized package manager:

https://github.com/mbostock/crom

In a nutshell, it’s convenient (essential, even) to use names when I’m installing a new dependency, but subsequently I want my package manager to resolve these names explicitly as URLs. That way, when my users install my code, they can install the right thing without needing a centralized repository. So, this:

crom install d3-voronoi@0.2

Captures all this:

{
  "dependencies": [
    {
      "name": "d3-voronoi",
      "owner": "d3",
      "version": "0.2.1",
      "range": "0.2",
      "url": "https://github.com/d3/d3-voronoi",
      "releaseUrl": "https://github.com/d3/d3-voronoi/releases/tag/v0.2.1",
      "sha": "1eb846e5b81ea7e25dab3184fa777a8db325d01146cdae02aa589b2349d162b8"
    }
  ]
}

Note that Crom supports semantic versioning by capturing the desired version range and being able to query the package URL to discover the current list of releases.

Please see the Crom README for more details. I’d love to help with this issue if I can!

Martii commented 8 years ago

@mbostock Would the GitHub retrieval be smart enough to not redownload a dependency e.g. check the current hash perhaps? npm has this current issue which is why we have been asking maintainers to publish to npmjs.com to speed things up. (but that doesn't always work for a suggestion)

@formula1 I appreciate the ping... seems like some solid info here... I'll probably only interject if there is something that I don't understand or need to add something.

pluma commented 8 years ago

@Martii GitHub seems to use S3 as a storage backend for binaries and S3 uses ETags (which may or may not be MD5 checksums depending on a number of factors), so it could use that instead of the sha hash. However it'd be necessary to follow the redirect to get the ETag.

ChALkeR commented 8 years ago

If we want to really improve the package management, the replacement should be superiour to what we have now. GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

mbostock commented 8 years ago

GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

I’d rephrase this as “decentralized package managers” or “internet-based package managers” in the sense that the mutability is not specific to GitHub—the internet itself is mutable by default.

What about implementing immutability as a service on top of a decentralized package management system? So, package authors still publish wherever they want, while a third party is responsible for either storing the hashes of the package contents for verification, or the contents themselves for immutable snapshots. (The current npm registry could serve this purpose, for example.)

That way, the package dependencies and the package management system wouldn’t be strongly tied to one centralized service, and there could be several services that compete to provide such functionality, similar to CDNs.

mbostock commented 8 years ago

@Martii It doesn’t look like GitHub includes any content hashes with release assets, though you can get the commit sha from the associated git tag. If there were demand, I expect GitHub would be receptive to exporting content hashes if it meant a substantial reduction in their traffic.

ChALkeR commented 8 years ago

@mbostock

I’d rephrase this as “decentralized package managers”

I wouldn't. Above in this thread was given an example of an decentralized package manager that gurantees that.

while a third party is responsible for either storing the hashes of the package contents for verification, or the contents themselves for immutable snapshots. (The current npm registry could serve this purpose, for example.)

Yes, that's what is required. But we have to get that this secure, decentralized, and not being a single point of failure.

formula1 commented 8 years ago

@mbostock Awesome stuff! a couple things though

If you have any disagreements or additions, feel free to share.

bnoordhuis commented 8 years ago

GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

The point about semver is a good one. It could perhaps be solved by having authors sign packages with their gpg key but that requires they set up a key first. Probably annoying for Windows and OS X developers because they won't normally have gpg installed.

It also doesn't solve the issue of efficiently figuring out what the latest semver-compatible release is but one issue at a time, eh? :-)

ChALkeR commented 8 years ago

It could perhaps be solved by having authors sign packages with their gpg key but that requires they set up a key first.

That won't help, authors themselves might replace a version which they already published, and that's not good. I know developers who review the changes in their dependencies, and having non-immutable package versions would nullify that possibility.

bnoordhuis commented 8 years ago

Right, if that is what mean by 'immutable', then yeah, key signing won't help. Blockchain time!

formula1 commented 8 years ago

Why would they want non-imutable (mutable?) package versions?

Ok, lets say for instance...

Or every install will initially hit the registry despite having distributor credentials. The registry would be the source of truth in these cases.

Not sure how blockchain fits into this but it sounds like something I don't think I can handle. Figuring out tor in node is crazy enough as it is

mbostock commented 8 years ago

I’ve not experimented with it yet, but IPFS looks like a potential candidate; it implements a distribution protocol designed for immutability.

formula1 commented 8 years ago

Looks sweet. That seem decent as one method of distributing for sure. I'm looking at your repo, do you think that inheritance is necessary? A few functions definitely need to be implemented but I I'm not sure if state is necessary

formula1 commented 8 years ago

Pinging

Before we begin, I think its important for all of us to understand what npm's goals for 2016. Heres a link

Recently Isaac gave a 2016 state of npm email which I am confident all of you have recieved. Under the Foundations header, he specifies that npm would best under a foundation umbrella, decentralized away from a single company product but rather a cooperative product. In order to create the best possible cli tool possible, feedback from the best in the business would be greatly appreciated. The fact that we have so many package managers out there proves how important package managers are and the fact that there are requirements that are not meant by others. Npm however is focusing currently on stability (which is very important) on their own product which makes huge changes much more difficult.

Recently, ftp and magnet uris (with special thanks to @feross for many awesome torrent tools) have recently been implemented as download mechanisms in my proof of concept by means as a plugin. I think this is a good spot to show that I am willing to put muscle into this if there can be a consesus of requirements for what is needed. I am fully aware my request for an audience may be ignored so I will do my best showing off what features has proven invaluable and showing what I am planning. I'm any efforts you provide will likely be much more worth than any ragtag repo I put togethor.

What seems important from your managers

Heres a few features that the proof of concept is meant to introduce

What I will be working on next

sheerun commented 8 years ago

Bower recently introduced Pluggable Resolvers that allow implementing such functionality by 3rd party without modifying the core (decentralized resolving). I suggest @npm to implement similar feature.

formula1 commented 8 years ago

Honestly, used to see Bower just as clientside npm. Ever since getting into this deeply I'm becoming more and more impressed with it as I look into it.

guybedford commented 8 years ago

An IPFS-based registry for jspm has been on my mind a lot recently as well here, which when coupled with the ability to sign packages (ideally even to a blockchain-style mechanism), seems like an ideal decentralized package management system. The stance jspm takes is that we can't assume that any specific implementation for distributed transport would work. It would be like betting specifically on Bitcoin and putting all your savings into it... we probably want to let a few systems fight it out before deciding on the "one". Am I right in thinking this is what the discussion here is all about? If so I'd be very interested to chat further as I've been dreaming about the mechanisms quite a bit.

formula1 commented 8 years ago

@guybedford Yes. You hit a major point right on the head. The manner which a package is recieved is not of the concern of the cli tool, only that it is recieved correctly. Bower (which likely has the most users here outside npm), Crom ( a PoC by @mbostock ) and my own PoC (which was heavily influenced by other peoples work) have implemented plugin systems to allow the use of arbitrary download mechanisms. I fully encourage you to chat, I think this can benefit all parties however I find myself speaking alone.

Another major point, which jspm does well, bower enabled and npmd does through cache primarilly, is how a package gets resolved. Who resolves the packages is up to the client to decide (though more likely than not npm will be used). But giving the client the ability to go through groups of registries to find a package is important. Jspm is a great example of this.

This requires decentralization and falbacks of regsitries. This already exists today in the npm, bower, apm and jspm little json file acting somewhat independent of one another. The difference here is that the CLI tool should be the same but the registries can be as different as they please. This enables the sort of Foundation Umbrella that Isaac was talking about, where there still can exist competition though we are all focused on accomplishing the same goal, a badass CLI Package Manager.

guybedford commented 8 years ago

@formula1 great to hear that and I agree with all the points, I'm just still not sure I understand what the exact focus is here. Is the goal to ensure package managers provide open transport implementations so that an ecosystem of registry systems can develop? Or are you trying to ensure a common language for package managers to handle this transport layer specifically? Or are you looking to implement your own decentralized package transport system, or at least ensure the possibility for one exists? I don't think the idea of an ecosystem of registries is a good thing in its own right..... the best thing for users is one registry. But I do think the idea of a completely distributed, decentralized and secure hash-verified registry is an interesting thing to pursue.

Martii commented 8 years ago

@guybedford

Or are you trying to ensure a common language for package managers to handle this transport layer specifically?

Not sure this would happen but a consideration none-the-less. Part of the reason why our organization chose npm over the others is maintaining different .jsons adds more to the workflow not to mention everyone has a different nomenclature. Our organization definitely wants to minimize the impact of maintaining but still allow others to compete... which I believe is one the goals presented in this issue.

formula1 commented 8 years ago

The two main goals here is to

I present my own vision so

Outside of attempting to that my own vision means only as much as others allow it to be. And I respect the progress and dedication others have made much more than my own, though I'm very excited about this opportunity to be apart of it and possible implementation.

guybedford commented 8 years ago

Thanks for the explanation, I was just not entirely sure how it related to the repo here, so apologies if I was a little too direct. It would be nice to take the time to just mention a couple of points along the lines of these ideas, if it is not going off track from the discussion too much.

The problem of transport, version lookup, and secure hash validation is actually a completely orthogonal one to the consumption of packages. This is why npm as a company can move to be more of a package provider than entirely relying on being the creator of a CLI tool. npm works just as well for jspm packages as it does for npm-CLI packages, even though they make completely different assumptions. A single distribution system of a high quality is a really good thing for users, and much better than having many registries with varying reliability.

In terms of availability, we should make the distinction between another registry (say the difference between npm and GitHub) and mirrors. The way to tackle package reliability is by using npm mirrors. This is a solved problem through npmrc configuration to use an existing mirror. Creating a new registry that happens to have copies of packages is not something that should be justified from a reliability perspective.

jspm provides an open API for registries as it seems Bower does as well now. But we have to be very careful not to run away with creating lots of different registries. Imagine if you installed a new package and it and all its dependencies ended up using 5 different registry systems. That's just multiplied the chance that the install will not work. It is very important for npm to maintain the role of the single dominant system, that is a really good thing, and alternative transports should be avoided as much as possible and any alternative registries given extreme skepticism. If a new registry comes out and everyone gets excited, and then more registries come out, that would be a very bad path to go down. (although new package managers are a good path despite churn)

We're trained to think of a single monopoly as a bad thing, and it often is, but in the case of npm it really is not. If npm was owned by evil scheming capitalists (and the truth couldn't be further from this), even then the only fears one might possibly have are for performance, availability, privacy and security. Performance and availability have been shown to be a massive focus for npm. Privacy of my package usage data is perhaps not an important concern at least currently. Security - verifying that the hash of the package I requested is the hash of the package I got is the hash of the package that was published could be handled by verification tools around npm. Apart from that, we have absolutely nothing to worry about, so we can happily continue to use npm even if it gets taken over by those scheming capitalists - there is no need to feel that we should be decentralizing power away from npm.

So that brings me to the final point - why do we want any of these things at all!?

And on that note, I will just mention again that the space of decentralized package transport without a central authority using IPFS or similar technology, combined with say a blockchain for authoritative DNS-like package ownership is a fascinating problem, and if anyone is interested in working in this area, I'd always be interested to have a chat. jspm would certainly be open to adding a registry along these lines in due course, but this is a far-future to be researched, prototyped and experimented with, but certainly not rushed out to users.

formula1 commented 8 years ago
joshgarde commented 7 years ago

Has there been any progress on this ever since last year? I've been following the Node community's progress towards a more decentralized package management system for a few days now, but I haven't found anything that truly offers independent operation from NPM. I've researched IPFS and I believe it's the best way to jumpstart any distributed package management system.

I was thinking of simply forking Yarn or NPM and building this system on top of their foundations (why reinvent the wheel?). I have a few ideas for implementation, but it seems everyone here has their own proposals too. If anyone has an existing implementation of their proposals, I'd love to contribute some code towards it. I think this is a good project to purse further.

joshgarde commented 7 years ago

I've just joined the IPMJS project and I'll be helping build out an implementation of a decentralized package manager over there for anyone wanting to follow the progress of this.

dominictarr commented 7 years ago

@joshgarde try this: %f9xdqPtVRm8j5nNjnN5wVoJl5gHSxnhBEGoS3T8Vr1g=.sha256 basically, since npm@5 has just a URL and a integrity hash, if you have a local server and know the sha1, sha256 or sha512 you can generate a package-lock.json that npm@5 will happily install. I tested this, but am waiting for the various bugs in npm@5 to be fixed. But this means you don't actually have to fork npm (yay, because maintaining your own fork will be hell)!

I think the same approach with yarn might be more complicated, yarn is slightly less explicit: https://yarnpkg.com/blog/2017/05/31/determinism/

npm can use only the package-lock as the source of truth in order to build the final dependency graph whereas Yarn needs the accompanying package.json to seed it.

^ the key line.

joshgarde commented 7 years ago

My goal isn't just to have the ability to mirror NPM packages onto an IPFS network, but to take NPM's core functionality and decentralize it - downloading packages, uploading packages, and searching packages. Adding on to the core functionality: establish trust between package distributors and users, and offer protection against malicious packages.

Trott commented 6 years ago

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

(Aside: This repo is dormant and might be a candidate for archiving.)