ipld / js-block

IPLD Block Interface
6 stars 3 forks source link

Dependency injection patterns for reducing bundle size #10

Open mikeal opened 4 years ago

mikeal commented 4 years ago

This issue is broader than just the Block API (it touches js-mutliformats, js-cid, js-ipfs, and js-libp2p) but this is as good a place as any to kick off the conversation.

Currently, we ship with support for a large set of:

This is not sustainable. Even the multiformat table, which is only metadata, has become large enough to be a bundle size concern. The number of potential codecs and hashing functions will only grow over time.

In all the libraries we’ve written for IPLD in the last year, we depend on Block API through require(‘@/ipld/block’). It’s how we create blocks which encode data (codecs) and CID’s (hashing and multiformats). Currently, these libraries just require(‘@ipld/block’) which includes all the default codecs and has an interface for adding more codecs. There is no way to pair the default set down and no way to remove or add hashing functions.

In considering this problem, I’ve been writing some of my newer libraries a little different. I’ve been doing the core implementation in a file called bare.js which exports a single function that accepts the Block class and the default codec. https://github.com/ipld/js-fbl/blob/master/src/bare.js#L8 Then in index.js I just return this module with the default require(‘@ipld/block’) class. The idea is, we can expose a Block class in the future that comes without any codecs, hashing functions, or multiformat table entries, and the user can add entries for each that it would like to support. Then that class can be passed into the relevant IPLD libraries that need to create or decode blocks.

const Block = require(‘@ipld/block/bare’)
Block.add(require(‘@ipld/dag-cbor’))
Block.add(require(‘@ipld/blake2’), { default: true })
const fbl = require(‘@ipld/fbl/bare’)(Block, ‘dag-cbor’)

As support for each format is added to the Block API the multiformat entries for each is added to the internal multiformat table. This means we’ll only have complete information for multiformat entries we actually have support for and this would cause a breaking change in CID because the .codec property would no longer work for codecs you didn’t explicitly add support for (the migration for this involves moving away from matching against the string and using the integer for the multiformat entry).

I’d like get feedback on this approach and everyone’s thoughts on the API before I dig in and implement it everywhere.

I also want to make sure this is going to work for the other projects once they adopt the new Block API.

/cc @alanshaw @achingbrain @vmx @rvagg @jacobheun @carsonfarmer

carsonfarmer commented 4 years ago

This is certainly interesting from my perspective, as smaller bundle sizes would help a great deal with building custom lightweight IPFS-based solutions in the browser. Presumably, for those concerned about not wanting to pre-determine which codecs they should support, there might be an "everything and the kitchen sink" variant or additional module that they could import?

On a somewhat related note, I wonder if a similar approach could be taken to (help) address this issue somewhat (I don't mean to derail the conversion here, I can bring that up over there if it seems reasonable)?

rvagg commented 4 years ago

Very +1

I can understand this being an inconvenience for the IPFS use-case where you're expecting to encounter a broad range of codecs, but many of the use-cases I toy with have a narrower set of requirements, often just needing dag-cbor and only expecting to ever deal with dag-cbor. I think it's reasonable to expect applications building on IPLD (and not IPFS) to have such narrow requirements and the current kitchen-sink approach is just too heavy-weight.

You might also be interested to hear that go-ipld-prime already has this kind of up-front feature injection already: https://github.com/rvagg/go-car-example/blob/4291971cc5ab84b878739711ff2ef60a031cd603/example.go#L312-L313

This may change in the future but currently it will fail to encode or decode codecs that you haven't told it about, you normally do this in an init that's run before anything else so you're good to go. go-ipld-prime includes the kitchen sink in the repo, it's just not wired up unless you do it explicitly as a user.

achingbrain commented 4 years ago

FWIW we already do something like this.

js-ipld needs us to tell it which codecs to support via the formats constructor option. Codecs can also be loaded on the fly by it calling the loadFormat function passed as an option.

So in js-ipfs we have a node.js version with support for loads of codecs, then a browser version that only supports raw, dag-pb and dag-cbor to keep the bundle size down. A full-fat browser version is also available.

A way to reduce the number of hash functions would be good though, the crypto deps are the ones that really bloat the bundle sizes.

mikeal commented 4 years ago

there might be an "everything and the kitchen sink" variant or additional module that they could import?

That would still be ‘@ipld/block’, the version without any codecs would be ‘@ipld/block/bare’.

vmx commented 4 years ago

I'd like to mention that I work on a similar approach for Rust IPLD. There we can use generics. Currently there is no Block API yet, but you can see how Multihash looks like with this approach here: https://github.com/multiformats/rust-multihash/pull/60. You would pass on the multicodec (together with implementations) into the Block API, which passes it down to the Multihash level.

jacobheun commented 4 years ago

+1 on this, the API looks reasonable to me. I'm looking to implement this approach in libp2p-crypto as well (aiming for April).

On a somewhat related note, I wonder if a similar approach could be taken to (help) address this issue somewhat (I don't mean to derail the conversion here, I can bring that up over there if it seems reasonable)?

@carsonfarmer I think so. Several of the multiformats repos could also benefit from this to help reduce size and add extensibility, such as https://github.com/multiformats/js-multihash/issues/73

carsonfarmer commented 4 years ago

The docs over in https://github.com/multiformats/js-multiformats now suggest that the Block API supports a minimal setup with extensible codecs, essentially as outline here. Is this the case? Or will it be soon? Very excited about this 👍.

rvagg commented 4 years ago

I believe @mikeal is working on this library right now @carsonfarmer. We have various libraries (in various states) working with js-multiformats now including new @ipld/dag-cbor, @ipld/dag-json @ipld/dag-pb. My new Bitcoin work is against it too, there'll be an @ipld/bitcoin. and I'm about to land one for datastore-car that switches it entirely to js-multiformats too. We're all in on this now and full-speed ahead on making it work so it might be a good time to have a poke around at it and see how it feels.

mikeal commented 4 years ago

I’ll be getting to the Block API very soon! Took a small detour this week to get on ESM https://github.com/multiformats/js-multiformats/pull/16

carsonfarmer commented 4 years ago

Amazing gang, this is stellar stuff! Happy to hear that (also happy to see the ESM stuff)! Going to dig into that as well, as we've been toying with the idea of moving to plain ESM output from our TS libraries.

mikeal commented 4 years ago

we've been toying with the idea of moving to plain ESM

We only started considering it once ESM was unflagged in Node.js LTS https://nodejs.org/en/blog/release/v12.17.0/

Given that js-multiformats is a new library we think there’s plenty of lead time for adopters to get on newer LTS versions. It also helps that some of the more problematic LTS updaters (cloud vendors, particularly AWS Lambda) have gotten better about this and are usually only a month or two behind.