ipld / js-block

IPLD Block Interface
6 stars 3 forks source link

Breaking Change: reduce bundle size by shipping no codecs or hashes by default #4

Open mikeal opened 4 years ago

mikeal commented 4 years ago

This deserves some discussion before implementation.

The largest contributor to our bundle size is codecs and hash algorithms. When unused these are completely unnecessary bundle bloat, and rarely are all of the codecs and hashes we bundle actually used. The get-codec library ships with only 4 codecs (raw, dag-json, dag-cbor, and dag-pb) and is already an unnacceptable bundle size for a dependency. Cutting down to just raw or maybe raw and dag-json should be an acceptable size.

But it’s worth discussing what the optimal API is for configuring which codecs and hash functions should be bundled. The Block API is in a weird place in the application stack, instances are created at a fairly low level and then passed on to higher level code, so it’s a bit hard to figure out where users would pass in these implementations.

We already have an API for configuring/adding codecs to the block interface but I’m not sure how I feel about it. It just exposes the get-codec interface.

Block.getCodec.setCodec(‘codec-name’, implementation)

There’s no equivalent API in multihashing-async so we’ll probably have to drop it and write our own.

Thoughts?

PS. When we removed async from encode() and decode() we also removed the ability to dynamically load a codec over the network. That may have been a mistake, but it’s not something I want to re-litigate until we have a better understanding of the performance tradeoffs. However, block.cid() is still async, and will remain so because browser hashing algorithms are async, and that means we could implement dynamic loading for hashing algorithms.

mikeal commented 4 years ago

Digging into this some more.

Another large source of bundle bloat is just the multicodec and multihash tables themselves. If we create an API for configuration that requires a string identifier and an integer, we can create the parts of the table we support and leave out the rest and not actually need to include the full tables.

rvagg commented 4 years ago

While it's convenient to have the codecs and algorithms bundled here, it's not necessary at this level and I'd anticipate higher layers of an application stack bundling codecs and algorithms in ways that are specific to that application. Just needs to be well documented I suppose.

large source of bundle bloat

What sort of sizes are we talking about here? You're not being too obsessive about size are you (in a premature optimization kind of way)?

mikeal commented 4 years ago

It’s 58K after minification and gzip, which is pretty big for being such a low level dependency.

https://bundlephobia.com/result?p=@ipld/block@2.0.6

As you can see from the visualization, the big contributors are the codec tables, borc for cbor encoding, and bignumber which is used by one of the hash algorithms.

vmx commented 4 years ago

I agree that keeping the bundle size in mind is a good idea. Though I'm with @rvagg here and think this might be a premature optimization.

E.g. if you're using IPLD, you are likely to use CBOR. So in any application, you will have borc as a dependency somewhere. So if you create your bundle it doesn't really matter the js-block also contains it.

I would rather look into an API to be able to remove unnecessary codec, rather then loading them dynamically somehow. I prefer having it easy to use by default and having it possible to reduce the size for people that really need it.

mikeal commented 4 years ago

E.g. if you're using IPLD, you are likely to use CBOR. So in any application, you will have borc as a dependency somewhere. So if you create your bundle it doesn't really matter the js-block also contains it.

So, I want to disagree on two counts ;)

I’m actually working on a browser thing that uses unixsfv2 but only dag-json in order to chop down the bundle size. So there are future use cases that won’t include dag-cbor that we should keep in mind as an option.

Second, borc is rather large and is mostly code generated by a compiler that is hard to minify. We should consider writing a replacement if we can increase the performance and reduce the bundle size.

I would rather look into an API to be able to remove unnecessary codec, rather then loading them dynamically somehow. I prefer having it easy to use by default and having it possible to reduce the size for people that really need it.

This would be nice but unfortunately it doesn’t work well with the way that bundlers work. Once you import something it’s going to be in the bundle. If we hand someone a dependency that imports other modules they are going to get them in their bundle even if they disable them somehow. This is one of the many reasons I’ve grown quite weary of the entire practice of bundling, but it’s the dominant pattern right now so we need to figure out how to work well with it.

mikeal commented 4 years ago

Another option we could consider is a secondary endpoint that excludes any codecs. Something like require(‘@ipld/block/bare’).

We could then leave the main block API as-is.

vmx commented 4 years ago

I’m actually working on a browser thing that uses unixsfv2 but only dag-json in order to chop down the bundle size. So there are future use cases that won’t include dag-cbor that we should keep in mind as an option.

I would still consider that a special case where you want to be able to trim it down :)

...

I have to admit I know too little about bundling and how to make things practically usable.

rvagg commented 4 years ago

Another option we could consider is a secondary endpoint that excludes any codecs. Something like require(‘@ipld/block/bare’).

I'm seeing this pop up a bit these days, @alanshaw gifted it to bl which happens to have streaming functionality but isn't its core value-add, so to use it without readable-stream (which is not getting any smaller) you can require('bl/BufferList') and your bundle won't suffer readable-stream.

I don't mind this approach and we should document it (same goes for bl which I'm noticing doesn't document it).

But back to my earlier point, I view this as an intermediate piece of a larger puzzle and I'd rather the glue go in higher levels of abstraction. The further you go up the abstraction chain toward a concrete application, the more firm your opinions can be because your needs become clearer. I'm pretty confident this piece will be very low in the abstraction hierarchy and therefore its opinions about what pieces it's going to be coupled with should be very lose.