ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Reconciling Terminology (Collections, Advanced Layouts, Composites) #144

Closed mikeal closed 5 years ago

mikeal commented 5 years ago

We have a lot of terms for different pieces of the same problem. I’ll try to describe each of these as best I can but please feel free in the comments to ask for clarity or changes.

Multi-Block Collections

We have a great document describing what multi-block collections are. In short, they are data structures that you want to treat/use like a Map or List but are large enough that they need to be spread over many blocks.

Advanced Layouts

Advanced layouts are a feature of IPLD Schemas. They describe a “node that is like a kind but actually is not a data model kind.”

Advanced layouts are a tool that can help people describe and potentially implement multi-block collections and other advanced data structures that want to operate similar to a given kind.

Advanced layouts do not prescribe how or where these are implemented but may include some hint as to how to get an implementation.

Composites

Composites are an experimental approach to programming advanced data structures in IPLD. Composites are a potential interface for implementing multi-block collections, as well as other functionality you may want “attached” to your data.

Composites will use IPLD schemas and may even use advanced layouts depending on the direction they take.

Final Thoughts

Hopefully, this ads some clarity as to why there seem to be several names for things that are similar.

There’s also a few things to keep in mind if we are going to try and compress these into less terms:

Personally, I’d like to see Advanced Layouts continue to evolve and I’m particularly interested in how the Go codebase in Filecoin might leverage them in their hamt implementation. This will look quite different from the model we have for writing Composites and I’m eager to compare and learn from it.

rvagg commented 5 years ago

A couple of problems I see that are causing conflict on the naming front are:

  1. "Advanced Layouts" are not a concrete thing, they are a placeholder for an idea that could fit into Schemas but isn't necessarily tied to them. We also never arrived at a specific schema syntax although we circled around something that seemed to work. But, as @warpfork has been saying (I think), the intention was that these things described structures that were laid out in advanced ways, extending the data model to more novel structures, including spanning blocks. The terminology made sense to me for these multi-block data structures, including binary types, basically anything that you couldn't interpret using the plain data model.
  2. I don't think the "Composites" concept has been fleshed out enough to make the case for needing a distinct word for this. Perhaps you could work on this more because I'm still not seeing why we need two words for these things.

Data structures that are more complex than just reading from the data model have two primary components: how they are laid out on the blocks, and the logic used to traverse them. I think you're trying to name these two pieces differently, but I can't quite put my finger on why they need a different name. Perhaps this discussion can be resolved by working on that distinction more and creating a clearer justification.

I also think our programing language lenses are causing problems here too. @warpfork has a concrete Node concept in go-ipld-prime, along with NodeBuilder and friends, and this seems to map cleanly onto the way he conceives of these data structures. In JS we have a completely different set of API challenges to meet. But you're taking it even further by injecting WASM into the equation and wanting a clear path to embedding logic and from what I understand that's driving the desire for a distinction. I still don't quite see the need for the distinction though, even with WASM. A given WASM component would just be performing a single <OPERATION> of a specific <Advanced Layout || Composite we identify as NAME>. Help us think about this the way you are.

vmx commented 5 years ago
  • In addition to Map and List, we also need multi-block binary data structures. Should this be added to the multi-block collections document or described well somewhere else?

That's not the only one. We also need a multi-block thing for huge single objects. You might want to get around block size limitations, being it due to the storage mechanism or transport (e.g. Bitswap). Please take this just as some note, I don't want to derail the naming discussions.

mikeal commented 5 years ago

We also need a multi-block thing for huge single objects.

So, in terms of Data Model types, I can see the case for string in addition to bytes, Map, and List. Am I missing any others? The other types should always fit in a single block.

mikeal commented 5 years ago

I don't think the "Composites" concept has been fleshed out enough to make the case for needing a distinct word for this. Perhaps you could work on this more because I'm still not seeing why we need two words for these things.

Right now we’re using the same multi-block collection use cases to drive the early experimentation, so the distinctions are quite hard to see. I’ll talk about a couple examples of things Composites can do that are, I believe, out of the scope of Advanced Layouts.

These don’t map cleanly to the capabilities we associate with Data Model kinds. And, from my understanding, the Node concept in go-ipld-prime is not setup to handle arbitrary methods either.

Now, this distinction may not be enough not to unify these under a single term. However, I do think that we should keep the requirements/features we add to schemas to something much more manageable. I also don’t want to assume that we aren’t going to find features we’d like to have in schemas for implementing multi-block data structures outside of Composites, as we are actively working towards today.

I would expect to see a HAMT implementation in Go using the NodeBuilder interface with some help from schemas and codegen in the not too distant future. That will not be a Composite, and we shouldn’t try to call any API for creating multi-block data structures a “Composite.” The ergonomics of implementing an advanced data structure as a set of stateless functions is rather difficult and, as we are already finding, there are some limitations that are difficult to work around. Whatever we end up with for building Composites in Rust/WASM will have tradeoffs when compared to a similar implementation in a system that can make use of shared state.

The benefit of a Composite is that you can attach the implementation to the data and distribute/version it along with the rest of your data and anyone can safely use that implementation in a zero trust environment. This enables linking between advanced data structures that is upgradable and future proof — but there are still going to be cases where you want to create new data with a faster interface that makes use of shared state and I’d like to continue to support that use and take it seriously and give schemas a path to growing its support for that case.

vmx commented 5 years ago

I’ll talk about a couple examples of things Composites can do that are, I believe, out of the scope of Advanced Layouts.

Couldn't then those more complex things built on top of a library that implements advanced layouts?

Whatever we end up with for building Composites in Rust/WASM will have tradeoffs when compared to a similar implementation in a system that can make use of shared state.

There will be shared state between WASM and the host. This is how you exchange data. We'll see how much shared stated we want, but I guess it will be a lot for performance reasons.

mikeal commented 5 years ago

Couldn't then those more complex things built on top of a library that implements advanced layouts?

Yes. I’m pretty sure that Composites will eventually use the features of Advanced Layouts in IPLD Schemas. It’s a little too early to tell but that’s what I would expect.

As far a Schemas are concerned, Composites may just be another Advanced Layout. But since Composites are a fairly large and open ended superset of what Advanced Layouts in Schemas will support I don’t see how we could consolidate the naming.

rvagg commented 5 years ago

I think I'm getting closer to understanding the way you're conceiving of these things but it still seems like you're tying advanced layouts much tighter to schemas than the rest of us are and that's probably the main mismatch. The legitimately contested area seems to only be about "operations", which may or may not fit into something that's describing a "layout". But is that enough to go with "composites", or is "operations" enough to encompass those things which are discrete logical processes?

mikeal commented 5 years ago

We’ve certainly made some progress here and I think that’s about all we’re going to get with this issues alone.

Over the next few months composites are on hold and we’ll probably end up with a JS API for IPLD Schema advanced layouts, which means we’ll end up with a much more concrete example (working code) of how these things will differ and interact, which is why we can close this issue and come back later once we have more to talk about.