Better get API - Githubissues

Gozala commented 5 years ago

I think following https://github.com/ipld/js-ipld-block/issues/47#issuecomment-477259759 has gotten lost across multiple discussion threads, so I'd like to discuss it in context this time.

I view Block as a format agnostic API to interact with blocks - DataView for IPLD blocks if you will. However I think block.reader().get(path) is awkward as it either returns link into external block (one with diff CID) or a fully decoded value that corresponds to the path. That is just another instance where #3 is being compromised.

Imagine following data being encoded in flatbuffer format:

{ a1: { a2: { a3: { a4: ...rest }}}}

Accessing path /a1 would cause decoding everything under it. While alternative design would allow you to defer actual decoding until necessary.

What I propose get to return is another Block instance or call it a BlockCursor if you want it to be different thing, point I'm trying to make is just like DataView has byteOffset it's reasonable to have a path for a BlockView.

That is to suggest I think get() should return remote or internal "link" like: {target:Block<a>|CID<a>, path:string}

In that vision existing Blocks have their path set to /.

Gozala commented 5 years ago

One thing that may have not being clear in my previous comment is that I propose to add path to the Block as that would allow following interactions (assuming Block & Reader are merged):

const a1 = await node.get('a1')
const a4  = await a1.get('a2/a3/a4')

await a.decode()

Gozala commented 5 years ago

I would also like if Block could be create from CID, however that would mean that Block needs to be able to perform IO which may be dealbreaker, but if it isn't it would make the whole API so much nicer because you'd be able to create block like block = Block.forCID(cid) and that would also mean that get() could always just return eventual Block instance that is either anchored to CID or an existing Block.

Gozala commented 5 years ago

Additional benefit would be that it would allow one to encode fragment of the block without having to go through decode / encode steps. Which again matters for lazy cases like Flatbuffer where you can get subset like node.get('/foo/bar') without decoding anything and then call encode() on it which will effectively just take slice of underlying buffer corresponding to the value and create new buffer of the same format.

mikeal commented 5 years ago

I view Block as a format agnostic API to interact with block

I agree, however, I see Block as a format agnostic API that does have a specific format assigned to the block.

It’s important that Block be, semantically, a single unit. That doesn’t mean we can’t do inline blocks, but things get very complicated if we try to push multi-block concepts into the Block implementation. At some point, you need to be able to reason about a single unit at a time. Also, when you start to consider replication [see below] you must be able to reason about Blocks in their full state somewhat agnostic of inline blocks.

The vast majority of read operations will need to happen in multi-block interfaces, and those interfaces are where we can account for inline blocks, encryption, collections and other problems. For a peek at that, I would point you to https://github.com/ipld/js-ipld-stack/blob/master/src/path-level-zero.js which performs path read operations across many blocks.

Accessing path /a1 would cause decoding everything under it. While alternative design would allow you to defer actual decoding until necessary.

Yes, unless you make a3 a multiblock (CID w/ identity multihash) in which case that value will be a CID. This is why it’s important to use a multi-block interface to do all the reads, because if you want to do reads into that CID you’ll need to use a multi-block path reader.

I don’t see a very large use case for the BlockCursor/DataView because it’s still limited to a single block. Instead of getting fancy about the return value, why not push people towards doing more discreet read operations in a multi-block interface, so if you need 3 different properties deep into that structure you just do those 3 reads independently with long paths and rely on internal caching in the multi-block interface.

So, you’d do something like:

let get = getFromMyBlockStore
let root = someCID

let value1 = await pathLevelZero(‘/a1/a2/a3/a4/prop1’, root, get)
let value2 = await pathLevelZero(‘/a1/a2/a3/prop2’, root, get)
let value3 = await pathLevelZero(‘/a1/a2/a3/a4/a5/prop3’, root, get)

This might seem expensive but assuming the path resolver has an in-memory LRU cache and/or the block store’s get function has caching, it’s pretty fast because the blocks cache their reader and any prior decoding.

Replication

The blocks function is a good example of the kind of interface we’ll need for replication. https://github.com/ipld/js-ipld-stack/blob/master/src/path-level-zero.js#L25

This API will return all the blocks along a path. Eventually this will mean reading into multi-block data-structures like collections (hamt, ordered map, etc) and that also means reading through links embedded in inline blocks. However, it only actually yields “full” blocks because those are what we need to replicate. This is sort of hard to explain, but the interface effectively has to think of a path as being agnostic of block boundaries but also return the actual blocks that assemble the path.

It’s a hard balance because we want these interfaces to be very agnostic of codecs and block boundaries but there are also cases like this where it’s essential to surface those details.

Gozala commented 5 years ago

I need to think a bit more about this, however I’m starting to think that API invites confusion here: on one hand you want to draw distinction between blocks and just encoded field data, but on the other hand you do have universal get to access both

Gozala commented 5 years ago

@mikeal is multi-block is an actual think or something you propose could be build on top of the Block primitive ?

Gozala commented 5 years ago

So here are few thoughts that are not fully formalized but might offer some perspective to my objections

Properties and links share same namespace but then interface for working with corresponding values is fundamentally different, even traversal is different.
I think your primary arguments against proposed change is rooted in the terminology. If we call it a Block representing some property as another Block doesn't make sense because it is not a block. However if we changed terminology to say BlockView (as in view into the block) most arguments become invalid. You are right that one could pass around paths and Block instance to achieve the same result, although here few counter arguments:
- At the end of the day it's still glorified [path, block] why not make it also convenient to use ?
- It defeats the ocap because whoever I pass node + path can access rest of the data. As I mentioned Flatbuffer case would just pass slice of the buffer and all the other data is no longer available to the receiver.
- Receiver inherits complexity, essentially you're forcing consumer to create some wrapper so it can abstract over getting value or a pointer to one + LRU cache that won't be shared. If you just back that into return value cache can be shared.
In my view this not related to nested blocks, just a simpler API to work with, although it is starting to blur lines if you get returns same thing across local / remote blocks properties.

Gozala commented 5 years ago

What about something along these lines instead:

interface LinkView<a> {
  type: "LinkView";
  target: CID<a>;
  path: string;

  // Would be nice to have a fetch included, but could be omitted.
  fetch(bitswap):BlockView<a>;
}

interface BlockView<a> {
   type: "BlockView";
   block:Block<mixed>;
   path: string;

   get(path):Eventual<BlockView<mixed>|LinkView<mixed>>;
   links():Eventual<Iterable<[Path, CID<mixed>]>>;
   tree():Eventual<Iterable<Path>>;

   decode():Eventual<a>;
}

And in that case you could even replace .reader() on Block with .view():BlockView<a>.

Gozala commented 5 years ago

In fact Block could just implement BlockView interface and become:

export interface Block<a> extends BlockView<a> {
  +codec: Format;

  encode(): Eventual<BinaryEncoded<a>>;
  cid(): Eventual<CID<a>>;

  view(): BlockView<a>;
}

ipld / js-ipld-stack

Better get API #10

Replication