Open Gozala opened 5 years ago
One thing that may have not being clear in my previous comment is that I propose to add path
to the Block
as that would allow following interactions (assuming Block & Reader are merged):
const a1 = await node.get('a1')
const a4 = await a1.get('a2/a3/a4')
await a.decode()
I would also like if Block
could be create from CID
, however that would mean that Block
needs to be able to perform IO which may be dealbreaker, but if it isn't it would make the whole API so much nicer because you'd be able to create block like block = Block.forCID(cid)
and that would also mean that get()
could always just return eventual Block
instance that is either anchored to CID or an existing Block
.
Additional benefit would be that it would allow one to encode fragment of the block without having to go through decode / encode steps. Which again matters for lazy cases like Flatbuffer where you can get subset like node.get('/foo/bar')
without decoding anything and then call encode()
on it which will effectively just take slice of underlying buffer corresponding to the value and create new buffer of the same format.
I view Block as a format agnostic API to interact with block
I agree, however, I see Block as a format agnostic API that does have a specific format assigned to the block.
It’s important that Block be, semantically, a single unit. That doesn’t mean we can’t do inline blocks, but things get very complicated if we try to push multi-block concepts into the Block implementation. At some point, you need to be able to reason about a single unit at a time. Also, when you start to consider replication [see below] you must be able to reason about Blocks in their full state somewhat agnostic of inline blocks.
The vast majority of read operations will need to happen in multi-block interfaces, and those interfaces are where we can account for inline blocks, encryption, collections and other problems. For a peek at that, I would point you to https://github.com/ipld/js-ipld-stack/blob/master/src/path-level-zero.js which performs path read operations across many blocks.
Accessing path /a1 would cause decoding everything under it. While alternative design would allow you to defer actual decoding until necessary.
Yes, unless you make a3
a multiblock (CID w/ identity multihash) in which case that value will be a CID. This is why it’s important to use a multi-block interface to do all the reads, because if you want to do reads into that CID you’ll need to use a multi-block path reader.
I don’t see a very large use case for the BlockCursor/DataView because it’s still limited to a single block. Instead of getting fancy about the return value, why not push people towards doing more discreet read operations in a multi-block interface, so if you need 3 different properties deep into that structure you just do those 3 reads independently with long paths and rely on internal caching in the multi-block interface.
So, you’d do something like:
let get = getFromMyBlockStore
let root = someCID
let value1 = await pathLevelZero(‘/a1/a2/a3/a4/prop1’, root, get)
let value2 = await pathLevelZero(‘/a1/a2/a3/prop2’, root, get)
let value3 = await pathLevelZero(‘/a1/a2/a3/a4/a5/prop3’, root, get)
This might seem expensive but assuming the path resolver has an in-memory LRU cache and/or the block store’s get function has caching, it’s pretty fast because the blocks cache their reader and any prior decoding.
The blocks
function is a good example of the kind of interface we’ll need for replication. https://github.com/ipld/js-ipld-stack/blob/master/src/path-level-zero.js#L25
This API will return all the blocks along a path. Eventually this will mean reading into multi-block data-structures like collections (hamt, ordered map, etc) and that also means reading through links embedded in inline blocks. However, it only actually yields “full” blocks because those are what we need to replicate. This is sort of hard to explain, but the interface effectively has to think of a path as being agnostic of block boundaries but also return the actual blocks that assemble the path.
It’s a hard balance because we want these interfaces to be very agnostic of codecs and block boundaries but there are also cases like this where it’s essential to surface those details.
I need to think a bit more about this, however I’m starting to think that API invites confusion here: on one hand you want to draw distinction between blocks and just encoded field data, but on the other hand you do have universal get to access both
@mikeal is multi-block is an actual think or something you propose could be build on top of the Block primitive ?
So here are few thoughts that are not fully formalized but might offer some perspective to my objections
Block
representing some property as another Block
doesn't make sense because it is not a block. However if we changed terminology to say BlockView
(as in view into the block) most arguments become invalid. You are right that one could pass around paths
and Block
instance to achieve the same result, although here few counter arguments:
[path, block]
why not make it also convenient to use ?What about something along these lines instead:
interface LinkView<a> {
type: "LinkView";
target: CID<a>;
path: string;
// Would be nice to have a fetch included, but could be omitted.
fetch(bitswap):BlockView<a>;
}
interface BlockView<a> {
type: "BlockView";
block:Block<mixed>;
path: string;
get(path):Eventual<BlockView<mixed>|LinkView<mixed>>;
links():Eventual<Iterable<[Path, CID<mixed>]>>;
tree():Eventual<Iterable<Path>>;
decode():Eventual<a>;
}
And in that case you could even replace .reader()
on Block
with .view():BlockView<a>
.
In fact Block
could just implement BlockView
interface and become:
export interface Block<a> extends BlockView<a> {
+codec: Format;
encode(): Eventual<BinaryEncoded<a>>;
cid(): Eventual<CID<a>>;
view(): BlockView<a>;
}
I think following https://github.com/ipld/js-ipld-block/issues/47#issuecomment-477259759 has gotten lost across multiple discussion threads, so I'd like to discuss it in context this time.
I view
Block
as a format agnostic API to interact with blocks -DataView
for IPLD blocks if you will. However I thinkblock.reader().get(path)
is awkward as it either returns link into external block (one with diff CID) or a fully decoded value that corresponds to the path. That is just another instance where #3 is being compromised.Imagine following data being encoded in flatbuffer format:
Accessing path
/a1
would cause decoding everything under it. While alternative design would allow you to defer actual decoding until necessary.What I propose
get
to return is anotherBlock
instance or call it aBlockCursor
if you want it to be different thing, point I'm trying to make is just likeDataView
hasbyteOffset
it's reasonable to have apath
for aBlockView
.That is to suggest I think
get()
should return remote or internal "link" like:{target:Block<a>|CID<a>, path:string}
In that vision existing
Block
s have theirpath
set to/
.