ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

Notes from discussion with Jeromy - path towards the IPFS npm's companion #65

Open daviddias opened 8 years ago

daviddias commented 8 years ago

started on ipfs/pm/issues/38

@whyrusleeping ipfs-blob-store https://github.com/ipfs/ipfs-blob-store/pull/5 ;)

daviddias commented 8 years ago

@whyrusleeping what would be necessary to get 0.4.0 to be able to read other mfs's through a custom pubkey?

whyrusleeping commented 8 years ago

@diasdavid could you write out your usecase for me?

daviddias commented 8 years ago

What I want is to be able to do npm install 'moduleX' without having to cached npm first, but use registry-static with ipfs-blob-store underneath to understand that with the pubkey of node that is hosting the whole npm, we can seek through the network and find its mfs folder and then get the module from there.

It would require us to

  1. Have a large capacity Node that caches the entire of npm using registry-static on top of ipfs-blob-store into its mfs
  2. Attach registry-static in a read only mode to another Node, where ipfs-blob-store will use a different pubkey to access the fat node mfs folder
  3. Then to install a module 3.1 npm i module-name 3.2 ipfs queries the network the get the latest hash to the mfs folder of the large capacity node 3.3 then registry-static performs the reads on that folder (which has all of npm cached), which is available through the network. It is essentially a way to mount a mutable remote folder.
jbenet commented 8 years ago

cc @mappum

whyrusleeping commented 8 years ago

@diasdavid so the fat node could just do an ipns publish of its mfs directory representing the npm cache. Then the node who is running the npm install could do ipfs files cp /ipns/QmFatNodeHash /path/to/npm/cache and operate on that.

whyrusleeping commented 8 years ago

but every time that the fat node updates, the client nodes would have to re-run the copy command if they want to stay up to date. Some package managers leave that step up to the user, eg. cache the repo state for up to a day, or until the user runs pacman -Sy (or similar)

mappum commented 8 years ago

but every time that the fat node updates, the client nodes would have to re-run the copy command if they want to stay up to date.

It's more in-line with how npm works to only update when the user explicitly wants to, npm versions are already immutable (they won't let you publish changes to a version that is already published).

BTW, the fat node is good as a fallback to ensure everything is always available on the IPFS network, but it makes a lot of sense to also have nodes that install packages provide them (then you can just fetch packages over the LAN, have high availability, etc.).

Also, I think it will be important for package authors to add the IPFS hashes of their dependencies in their package.json, then ipfs-npm can simply fetch those hashes and put them in node_modules. Those hashes are trustworthy enough since you are already trusting the author's code anyway.

mappum commented 8 years ago

Also, it would be really cool to make a decentralized registry, where many people run that fat node (it's feasible since it probably dedupes to being a lot smaller than the total 200+ GB, and also not everyone will need to have all the packages). Then users installing a package can just check the package hash from each (or many) of those node's registries and ensure they all match. This prevents attacks where registry operators maliciously modify the code.

daviddias commented 8 years ago

@whyrusleeping:

Then the node who is running the npm install could do ipfs files cp /ipns/QmFatNodeHash /path/to/npm/cache and operate on that.

But wouldn't that make it that each user would download the entire npm through IPFS?

@mappum

Also, it would be really cool to make a decentralized registry, where many people run that fat node (it's feasible since it probably dedupes to being a lot smaller than the total 200+ GB

Right now is half a TB (the readme on registry-static is not up to date) and I'm not sure if it would dedup that much since our chunking algo won't take how code is divided into account

@mappum

Also, I think it will be important for package authors to add the IPFS hashes of their dependencies in their package.json, then ipfs-npm can simply fetch those hashes and put them in node_modules. Those hashes are trustworthy enough since you are already trusting the author's code anyway.

This would be awesome! But also would break the flexibility of semver, the hash of each module could be a IPNS hash that would point to the latest version and all the other versions before.

Nevertheless, let's get the use case of installing from IPFS done, without having to change how the ecosystem works today.

So back on: "ipfs files cp /ipns/QmFatNodeHash /path/to/npm/cache and operate on that.", would this cp do any lazy loading?

If yes, we can make the baseDir being a copy and this ipfs files cp could happen on first run and later by the user, for updates, however this adds and extra step.

If not, we have to have a way to 'mount' /ipns/QmHASH-of-fat-node/npm-registry/all-the-modules* locally (with lazy load) so that registry-static can use it and ask IPFS to download a module when it needs it.

whyrusleeping commented 8 years ago

But wouldn't that make it that each user would download the entire npm through IPFS?

nope, ipfs files cp .... is a very cheap command. it just modifies the links in the tree.

daviddias commented 8 years ago

Awesome, let's try it then :)

We need:

whyrusleeping commented 8 years ago

@diasdavid hmm... I have a machine with a lot of disk space, but i've been holding off on using it for ipfs stuff... I suppose I could set you up an account on it. It only has a 60Mbit symmetric link though. not gigabit.

jbenet commented 8 years ago

Right now is half a TB (the readme on registry-static is not up to date) and I'm not sure if it would dedup that much since our chunking algo won't take how code is divided into account

it will dedup a ton when you use ipfs tar to import the tarball. will dedup all the same files.

I have fiber in NYC, but won't be there until 10/24ish

daviddias commented 8 years ago

It's aliveeee https://github.com/diasdavid/registry-mirror :)

jbenet commented 8 years ago

Awesome!!! :) On Wed, Nov 25, 2015 at 16:14 David Dias notifications@github.com wrote:

It's aliveeee https://github.com/diasdavid/registry-mirror :)

— Reply to this email directly or view it on GitHub https://github.com/ipfs/notes/issues/65#issuecomment-159763683.

maboiteaspam commented 8 years ago

:clap: :clap: