Open Gozala opened 2 years ago
Below idea describes a general way of embedding DAGs in data structures which:
interface Embed<
Data extends unknown = unknown,
Format extends number = number,
Alg extends number = number
> extends Link<Data, Format, Alg> {
// So that `isLink` will be `true`
['/']: this['bytes']
// same as in Delegation
export(): IterableIterator<Block>
}
Few things to call out here:
Link
interface making it a "embedded" link./
property because that is how CIDs are identifiedexport
method just like Delegation
has, so that embedded could extract relevant blocks and pack them in the same CAR
https://github.com/web3-storage/ucanto/blob/e97bd8e15d5e42a3e9be2ce949acdd18de543dc1/packages/interface/src/lib.ts#L148This solution is very constrained. It requires that you encode the thing ahead of time, while it is inconvenient, it is by far the most pragmatic approach as it remove all of the non-determinism by removing questions about what codec to use or what multihash to use or what if encode fails.
We could also just make Delegation
be an extension of Embed
.
Few things that worry me about the above idea:
Link
API surface is fairly large, with addition of export
chances for name collision are uncomfortable.export
is too generic and we need bit more clear method name here.Block
and Link
are incompatible which is a shame, because that way we would implement Link, Block and Dag interfaces to represent all three.export
is not ideal. Now we have an interface that could be implemented to allow iteration over underlying IPLD blocks
Even so I would not want to traverse arbitrary objects to identify IPLDView
s. Instead we could extend schema system to support links natively, just like we do for arrays and optionals. Something along the lines of
export interface SchemaWithLinks<
O extends unknown = unknown,
I extends unknown = unknown
> extends Schema<O, I> {
/**
* Turn given schema into a link that can be decoded into
*/
link(options: LinkOptions): Schema<O|Link<O>, I>>
}
type LinkOptions {
codec: BlockCodec
version?: 0 | 1
hasher?: MultihashHasher
}
That way any struct / tuple / map / array member could be a link and schema would be aware of how to encode / decode and iterate over blocks.
I have looked into this and I see there are couple of options available, no option feels perfect so I think it will boil down to picking a right tradeoffs.
This option adds an .attachment()
modifier on the schema to identify things that need to be DAGs implementing IPLDView
interface.
const Offer = Schema.struct({
size: Schema.integer(),
commit: Schema.string(),
}).array()
const Aggregate = Schema.struct({
offer: Offer.attachment(),
})
const offer = Offer.from([
{ size: 1, commit: 'a' },
{ size: 2, commit: 'b' },
])
const aggregate = Aggregate.from({
offer: await Offer.attach(offer)
})
Attachments .attach
creates attachment that implements IPLDView
interface and with added field to reference actual data.
aggregate.offer.data[0].size
data
a getter to defer decode, however I'm not fond of getters that could potentially throwaggregate.offer.load()[0].size
.resolve()
modifierAlternatively we could use add .link()
modifier to mark links. In addition we could add .resolve()
modifier on links to signal that link should be resolved automatically
const Offer = Schema.struct({
size: Schema.integer(),
commit: Schema.string(),
}).array()
const Aggregate = Schema.struct({
offer: Offer.link().resolve(),
})
const offer = Offer.from([
{ size: 1, commit: 'a' },
{ size: 2, commit: 'b' },
])
const aggregate = await Aggregate.from({
offer
})
This option unlike previous can take care of encoding linked references on demand and also decode them without additional boxing
aggregate.offer[0].commit
Additionally we could make non .resolve()
-ed links similar to previous attachments:
And encoder will take care of including blocks from view if you passed later. Decoder will produce links with a .resolve()
method so you could lazy load the linked DAGs. However if .resolve modifier was used linked dags would get preloaded and unboxed for you, meaning property will correspond to the schema as opposed to link that you can resolve to get a value corresponding to schema.
What we loose however is an ability to require that encoder pass a DAG as opposed to a link unless .resolve is used, but if you do use .resolve then you loose lazy loading on decoder side
@vasco-santos your feedback would be helpful
What we loose however is an ability to require that encoder pass a DAG as opposed to a link unless .resolve is used, but if you do use .resolve then you loose lazy loading on decoder side
After sleeping on it I’m realizing that we could add another modifier e.g. link().embed() to require that client provide a DAG but do not do automated loading on the other side.
I didn't know about this issue and story behind we already wanting to support attachments long before. Reading through all this, I agree that no option feels perfect and we should evaluate what are the tradeoffs.
I like that first option (Attachment interface) is more explicit. It was also more what I was expecting when we talked before. However, second option (resolve()
+ embed()
) seem to be more flexible and easy to interact while building on top of this. I would go with this one.
cc @alanshaw can you get the clock use case here?
I have a merkle clock interface currently where you invoke clock/advance
to add a new event to a remote clock. The event is a block which has a specific structure. Aside: it would be nice to define the expected structure in capability definition.
I currently have:
capability({
can: 'clock/advance',
with: URI.match({ protocol: 'did:' }),
nb: Schema.struct({
event: Link.match({ version: 1 })
})
})
Each event points back to one or more previous events:
[e] -> [e-1] -> [e-2] -> ...
Remote clock may be at e-2
, so I'd like to invoke clock/advance
and attach block e
but also attach block e-1
in the case where I know the remote clock is currently at e-2
.
It would be nice if I could include these additional, possibly relevant/useful blocks so they don't have to be communicated out-of-band.
I believe at the moment, with the schema I have, I cannot attach e-2
unless I put it in facts or alter the caveats to include an array of extra blocks.
I think the embed
interface as described above might allow me to attach e-2
, but it might be nice to allow more generally .attach()
to be called with any block that is linked to by any other block in the delegation.
At the moment library known how to bundle non-inline UCAN chains into a payload. However if you want to pass other large data e.g. large binary blob there is no good way of doing it:
Proposal
Library already deals with bundling blocks from linked proofs. Specifically it look at each proof and if it is a
Delegation
it bundles it's blocks with request payload.https://github.com/web3-storage/ucanto/blob/0ec460e43ddda0bb3a3fea8a7881da1463154f36/packages/core/src/delegation.js#L158-L182
We should generalize this further and consider things inside
capabilities
the same way. If the thing is a "Dag" encode it as a CID and include blocks into request payload.This would make it really simple to make decisions about what to link and what to include. There had been some relevant musing on the subject here https://github.com/multiformats/js-multiformats/issues/175