wip: migrate to latest multiformats

mikeal commented 3 years ago

Still need to port some codecs and fill in some coverage, but all the tests are passing against dag-cbor.

This gets the Block interface up to date with @Gozala’s latest refactor. I’d like to take a moment to note how good the work has been on that refactor. You can see it here in the diff, everywhere that we touch multiformats it’s a little bit cleaner and also a little more explicit and even adds some features. All of this while getting rid of a lot of the harder to work with parts of this API that came from the prior multiformats dependency injection, and we still get to keep all the same features and codec registry just hoisted to the Block interface. This was a some really great work on @Gozala’s part and it really shows.

rvagg commented 3 years ago

It's so hard to review this when most of it is just indentation changes! It seems that the major changes are really just in hoisting the Block.codecs registry up here, with add() and friends, but otherwise it's the same. Correct?

Gozala commented 3 years ago

It's so hard to review this when most of it is just indentation changes!

@rvagg There is this really handy thing in case you were not aware

mikeal commented 3 years ago

Thanks @mikeal for updating this and other libraries to changes in multiformats. I'm hesitant to express strong opinions here. I personally think that compositional approach illustrated in dag.js from multiformats/js-multiformats#38 is a better alternative to this, but I am obviously biased.

I don’t want to diverge too far from the current approach in terms of presenting a class with along with some methods to help you create instances, along with a registry of codecs. It’s not that I’m adamant that this is the right approach for every case, but I do think that this approach works quite well in some cases and I want to make sure the libraries underneath Block (codecs and multiformats) support this approach.

However, I’d prefer to have this library presenting and producing block instances that conform to a lower level block interface defined by multiformats. And that interface should be designed to support this approach as well as the approach you’re exploring in dag.js. That obviously isn’t what is here yet, but I wanted to see what this looks like on top of the latest multiformats types before I tried to bring design considerations back into a muliformats/block interface.

Gozala commented 3 years ago

As broader question, I’m curios what are the use cases for lazy block encode and lazy block decode. Are there cases where you'd want to create a Block but never materialize it (not talking about CID, but turning bytes to JS value or other way round) ? I am inclined to think that if you do never end up materializing a block you're probably choosing a wrong interface for your library or a program. If you do end up materializing it later on in the program than all you achieve by deferring is making performance characteristics less deterministic, because bunch of computations can (and usually will) accumulate and then will be forced in one large batch that tends to create spikes on load.

That is just to say that it's best to make conscious choice when deferring computations. It also makes reasoning about code (as in when reading it) a lot easier (less context is required to infer it's behavior)

mikeal commented 3 years ago

Are there cases where you'd want to create a Block but never materialize it (not talking about CID, but turning bytes to JS value or other way round)

Creating blocks you never encode is surprisingly common in multi-block data structures during mutation. You often have interfaces that create a bunch of blocks on mutation, but many of those are then orphaned by another mutation operation before they are ever written to the block store. Ideally you could factor this out using bulk operations but we don’t have reasonable bulk operations for some data structures (like our HAMT).

Creating blocks you never decode is very common because that’s what happens during a lot of replication operations. Whenever you’re moving data from one store to another you can avoid a decode.

Gozala commented 3 years ago

Creating blocks you never decode is very common because that’s what happens during a lot of replication operations. Whenever you’re moving data from one store to another you can avoid a decode.

I wonder if this indicative of the other gap https://github.com/multiformats/js-multiformats/issues/37

It is also something I tried to represent via BlockView type.

Gozala commented 3 years ago

Creating blocks you never encode is surprisingly common in multi-block data structures during mutation. You often have interfaces that create a bunch of blocks on mutation, but many of those are then orphaned by another mutation operation before they are ever written to the block store. Ideally you could factor this out using bulk operations but we don’t have reasonable bulk operations for some data structures (like our HAMT).

This sounds similar to what I tried to represent via BlockDraft interface. I think you could have layer for building up dags and materializing or whiting them in one operation, but that does not necessarily mean that what you build up with should be a block. In fact the reason my BlockDraft just holds value codec code and hasher code is so that you could build things up on one thread and materialize them on the other one (where you do have codecs and hashers available). If you choose to represent that via Blocks as well doing this on different threads (or processes) becomes a lot more trickier.

Gozala commented 3 years ago

Please note that current approach complicates things for both cases:

Actor that does replication becomes concerned with codecs and hashers, which is incidental, it could just replicate CID+Block|Code+Block pairs, it's just that does not have a canonical representation, which then Block attempts to fill.
Actor that is building up dag structure, does not really need to be concerned with codecs or hashers. It just needs to create compact recipe so that another actor could materialize the dag.

Note: That I fully understand that when building up dags you may have some pre-existing (already materialized) blocks interleaved with new (not yet materialized) blocks. But argument I'm making is that actor doing the dag building needs to be able to be able to distinguish so it can choose the best strategy to transfer the recipe in the most effective way to another actor (which may be on the separate thread, process or even machine). In fact you'd likely even want some way to represent remote blocks so that they don't need to roundtrip and having to do all this through lazy Block representation isn't necessarily helpful.

mikeal commented 3 years ago

@Gozala latest push splits the BlockEncoder and BlockDecoder classes out.

mikeal commented 3 years ago

https://github.com/multiformats/js-multiformats/pull/40

ipld / js-block

wip: migrate to latest multiformats #16