ucan-wg / invocation

UCAN Invocation & Pipelining
Other
12 stars 5 forks source link

v0.1.0 #1

Closed expede closed 1 year ago

expede commented 1 year ago

đź“ś Preview

Pulling the low-level capability invocation bits out of https://github.com/ipvm-wg/spec/pull/8 to UCAN because this layer doesn't have any direct IPVM dependencies

expede commented 1 year ago

ah right I get it now! I’m glad we’re talking about this because this was something that we’ve considered in our invocation logic and made some decisions that look different

Amazing!

task - in ucanto we don’t have tasks we have invocation which is represented as delegation with a single capability

I spent some time this weekend teasing apart some of the concepts in this branch, so that we can possibly break these apart into specs where DAG House can only implement the bits that make sense for you. This is still subject to change, but the WIP looks something like this:

I think we could essentially break this up like so:

Screenshot 2022-12-05 at 09 39 29

decided against them as it raised questions around invocation order.

Totally! This is what dataflow aims to solve in the current spec: they form a partial order. If there's no dependency between two tasks, then they can get executed in parallel or sequence; it doesn't matter. If you need to order effectful operations, then you put a promise between them.

It also meant single request can have multiple invocations to different audiences (executors).

Yup totally đź’Ż (Assuming that they have the authority to run some action of course)

We don’t need special syntax for promises because in our case those are invocations and we can refer them by cid.

I thought this too, at first. There's two problems:

  1. We need to be able to distinguish between using a CID as an argument versus a promise (i.e. the output of a task, not the task itself).
  2. Duplicate requests in a batch need to be unique. We could add a nonce to every invocation, but that's a lot of nonces. It's also a lot more cumbersome versus naming things, the same way that we name paths in a file system.

i still wish we had a separate syntax to refer to result

This is what a promise is meant to be: "the result of running this action". With promises, you generally want to select the success branch, but not always, so promises let you also do branch selection. That said, we are building a distributed memoization table (DMT) in IPVM to store the receipts of Tasks that anyone can look up.

zeeshanlakhani commented 1 year ago

I spent some time this weekend teasing apart some of the concepts in this branch, so that we can possibly break these apart into specs where DAG House can only implement the bits that make sense for you. This is still subject to change, but the WIP looks something like this:

I think we could essentially break this up like so:

Screenshot 2022-12-05 at 09 39 29

Super minor, but this reads like promises don't end in invocations that return a receipt, or maybe that was my read until I re-read the spec parts :).

expede commented 1 year ago

but this reads like promises don't end in invocations that return a receipt,

@zeeshanlakhani oooh that's a good point: invocations with promises do have receipts, but closures with promises don't: the closure's args need to be made concrete before they can get a receipt for it, and it will get memoized as such.

...so more fields are needed there, because we have the call site plus the substitution 🤔🤔🤔

expede commented 1 year ago

@Gozala I'm going to mangle the order a bit, but:

ah right I get it now! I’m glad we’re talking about this

🙌

Before I dive in lets make sure we agree on terms because we keep tripping over using conflicting terms in our work.

Indeed đź‘Ť

task - in ucanto we don’t have tasks we have invocation which is represented as delegation with a single capability. All the inputs are part of nb and if they are large they are just CIDs, actual data CID points to is packed in the same CAR (there some cases when it’s not bundled intentionally but I’m not going to go into that)

We recognized that instead of multiple capabilities in single invitation we could simply send multiple invocations in a single query and if they share same proofs it will just deduplicate

Just as an array. Sure, makes sense.

It also meant single request can have multiple invocations to different audiences (executors). This was desired in our case because our various actors provided different capability sets so we can route invocations by audience

Oh interesting! Users can target multiple specific services in an invocation? Is the plan to eventually have them all to use the did:dns + forwarding method that you've proposed in UCAN core?

query - we wanted to bundle bunch of invocations and execute those with one request graphql style. Query syntax and selectors got complicated and we needed to cut corners. So we settled on primitive queries for now, which essentially a tuple of invocation links, which get tuple of corresponding results.

To clarify: do you mean queries on IPLD data, or querying the invocation structure itself?

We don’t need special syntax for promises because in our case those are invocations and we can refer them by cid.

I guess because each UCAN has a nonce, you can make these unique

i still wish we had a separate syntax to refer to result however so it’s clear when you’re referring to invocation itself vs result of it (something I was advocating for in Lisbon)

Yeah, I think that we need to disambiguate these. You can probably figure it out from context right now, but it's not an IPLD path or CID, it's a pointer to the thing the IPLD path resolves to. The bookkeeping on this may get confusing over time, the same way that DAG-JSON wraps CIDs in {"/": ...} to disambiguate.

The reason I wrote all this is because I think choice of envelope really shaped out some tradeoffs.

Absolutely đź’Ż

Specifically addressing things by IPLD path was something we intentionally tried to avoid so you could refer to whole thing without revealing anything about outer layers

Indeed, and this makes sense! It also means a lot of signatures, no? Do you have to sign each invocation separately, include separate nonces, enforce that they only carry a single capability in the UCAN, etc?

this also fits well with libp2p because we could map did:key to peerid, dial over libp2p and route individual invocation

Routing feels like it belongs at a different layer to me. What is the advantage of makingit part of the invocation RPC directly?

Grand vision was all ipfs gateways could provide http → libp2p routing. Eg my desktop IPFS node could issue delegation for my mobile phone for storing data. My phone could send invocations to public gateway which will resolve my desktop nodes address from aud and forward request storing data from my phone to my desktop

Yeah that makes sense! I'm not sure how that connects at the invocation layer though. Could you expand?

We have considered invocations with multiple capabilities in them but decided against them as it raised questions around invocation order.

I think I responded to this one earlier: this is what promises & pipelines do in the current spec. As you say, different solution-space. Projects like IPVM and Bacalhau MUST have ordering, hence promises.

From your description, the the DAG House design looks something like [&UCAN], is that right?

expede commented 1 year ago

Ah, also @Gozala maybe a clarification: I'm labouring under the assumption that UCANs will look like this shortly:

// Proposed UCAN v0.10
{
  "https://example.com/posts": {
    "crud/read": [
      {"foo": 1},
      {"bar": 2}
    ]
  }
}

Which does make it harder to select a single capability versus the existing syntax:

// Current UCAN v0.9
[
  {
    "with": "https://example.com/posts",
    "can": "crud/read",
    "nb": {"foo": 1}
  },
  {
    "with": "https://example.com/posts",
    "can": "crud/read",
    "nb": {"foo": 2}
  },
]

...and the proposed v0.10 syntax is possibly interpreted as a logical AND not an OR as presented above. (While we haven't worked out the exact details in v0.10 yet, I would expect this to be interpreted as an AND). As you're saying that you're forcing a single capability at the top level definitely disambiguates, too (though there are other tradeoffs like the number of signatures involved).

expede commented 1 year ago

Okay, merged #4 in here. It was just way way more complete. Not saying that it's The Right Thing:tm:, but it's way more right than before

Gozala commented 1 year ago

Oh interesting! Users can target multiple specific services in an invocation? Is the plan to eventually have them all to use the did:dns + forwarding method that you've proposed in UCAN core?

No idea is that all IPFS gateways could support this and route invocations to the nodes (over libp2p) identified as audience. That way it would not matter which host you’re talking to as much, which in turn could help with censorship as you’d be able to talk to the gateway that isn’t blocked.

More broadly “aud” could communicate where to run as opposed to the address you’re sending invocation to. This is especially interesting in multicast contexts.

Gozala commented 1 year ago

To clarify: do you mean queries on IPLD data, or querying the invocation structure itself?

Neither. Query in GraphQL sense. It’s a set of invocations (tasks in your terms) across various services (aud) and capabilities + optional selectors to only request subset of return data.

To clarify we don’t currently support selectors on return data and query structure is currently just a tuple of invocations. We did however had a prototype that had more GraphQL like structure where you could name results and you could apply selectors over them.

All in all it’s very similar to what “run” syntax here does & intent to revive that code to try and see how it fits with this spec

Gozala commented 1 year ago

Indeed, and this makes sense! It also means a lot of signatures, no?

Yes one per invocation. I realize the irony of me suggesting to trim number of signatures here, it’s just I have been thinking of these as delegations. When i was mapping this spec to our system I was imagining wrapping those over with yet another envelope.

Do you have to sign each invocation separately, include separate nonces, enforce that they only carry a single capability in the UCAN, etc?

I mean they are just delegations so yes. It’s just at the library layer we have “invoke” function that produces delegations with single capability hence enforces single capability. On the backend we deny service if invocation has more than one capability.

Long term thinking had been that we’d enable multiple capabilities in invocation in the future which would lift this limitation. I’ve been thinking of invocations with multiple capabilities as higher order capabilities which in lisp notation would be something like

; invoke store/add capability
(store/add {root})

; We also have high order capabilities in form of invocation with multiple capabilities
(account/info (store/add {root}))

Which is another way you could model rights amplification

Gozala commented 1 year ago

Routing feels like it belongs at a different layer to me. What is the advantage of makingit part of the invocation RPC directly?

I’ve hinted a bit on this with a comment about gateways that can route invocations. Besides that I believe it’s one of the compelling features of GraphQL you expose whole system just under different query endpoints and then can query whole system without worrying how internally system is set up.

If we think about whole network of executors (across org boundaries) as a single system it seems natural to wanting to submit queries without having to worry about network topology. That is not to suggest we should put network addresses into queries, but allowing different audiences implies that other layer could exist to do the routing.

Alternative would be for audience to map various executors to capabilities / resources (with), but I’d argue it is worse because different endpoints may remap them differently introducing incidental complexity. Instead I’d rather embrace “aud” field which already kind of says who you want to execute it.

We’re also considering this for things like relaying delegations e.g. if alice delegates some capability to bob but sends it to our endpoint we can keep the delegation around so bob can fetch it when he comes online. We could model this differently, but we’d just be creating another envelope and potentially make interop with others more difficult.

Gozala commented 1 year ago

Yeah that makes sense! I'm not sure how that connects at the invocation layer though. Could you expand?

I hope my earlier comments provided bit more context here. My main point is that invocations (queries in our terms) should be able to target different executors, that way recipient can run tasks addressed to it and route / forward or deny tasks that were addressed to other executors.

This spec prescribes single executor for the invocation / query which I think makes it more of REST like than GraphQL like.

Gozala commented 1 year ago

From your description, the the DAG House design looks something like [&UCAN], is that right?

pretty much, except it’s tuple as opposed to list

Gozala commented 1 year ago

Routing feels like it belongs at a different layer to me. What is the advantage of makingit part of the invocation RPC directly?

I think I found a better way to make my point - You may want to expose multiple services on the same network address. Current spec would imply that you either:

  1. Put routing service in front of those services in front to hide that fact.
  2. You make invokers send separate invocations instead.

Both have some downsides, (1) would would imply mapping capabilities which can get tricky if there are naming conflicts, which you could avoid perhaps through namespacing but that creates non linear chains because you have to derive capability without a namespace from the one with namespace. All in all not a big problem if router and services are operated by a single entity but a lot more complicated if you could be routing to arbitrary actors in the network.

Perhaps (2) is a reasonable limitation, yet I just find it unnecessary.

Gozala commented 1 year ago

Ah, also @Gozala maybe a clarification: I'm labouring under the assumption that UCANs will look like this shortly:

I've starting experimenting with that as well. Which is certainly challenging some of our design choices. The whole high order capability thing from prior comment goes out of the window. I'll report more as I make more progress on it.

Gozala commented 1 year ago

proposed v0.10 syntax is possibly interpreted as a logical AND not an OR as presented above. (While we haven't worked out the exact details in v0.10 yet, I would expect this to be interpreted as an AND).

I'm not sure I follow what you mean by AND vs OR here to be honest.

My interpretation was always AND with some transactional guarantees implied as in do store/add then store/list ... and fail the whole thing if any of the steps fail. Which is why we have decided to limit to single capability, that way transactional guarantees became trivial.

With new format there is no implied order so I think it's reasonable to treat them as concurrent tasks unless they pipe into each other and even when they pipe it still reasonable to expect that task been piped to can fail.

As you're saying that you're forcing a single capability at the top level definitely disambiguates, too (though there are other tradeoffs like the number of signatures involved).

With a new format I have had been reviving queries with selectors approach, with above rational single capability per invocation is less of a concern. But it does bring single vs multiple aud concern into a focus. I'm not entirely sure where I'll land on this.

zeeshanlakhani commented 1 year ago

I’ve hinted a bit on this with a comment about gateways that can route invocations. Besides that I believe it’s one of the compelling features of GraphQL you expose whole system just under different query endpoints and then can query whole system without worrying how internally system is set up.

If we think about whole network of executors (across org boundaries) as a single system it seems natural to wanting to submit queries without having to worry about network topology. That is not to suggest we should put network addresses into queries, but allowing different audiences implies that other layer could exist to do the routing.

As someone who is very pro-GraphQL typically, it's still a PITA for those optimizing the actual queries on the backend(s), especially in a distributed situation with different network topologies. GraphQL is notorious for n+1 bottlenecks and the like. I'd say routing and query are definitely different layers (and specs?), where something like appending trace info for invocation(s) is more useful first?

expede commented 1 year ago

@Gozala

proposed v0.10 syntax is possibly interpreted as a logical AND not an OR as presented above. (While we haven't worked out the exact details in v0.10 yet, I would expect this to be interpreted as an AND).

I'm not sure I follow what you mean by AND vs OR here to be honest.

If you have this:

{
  "https://example.com": {
    "crud/update": [
      {"day-of-week": "Friday", "time-of-day": "afternoon"},
      {"content-type": "application/json"}
    ]
  }
}

I expect this UCAN to restrict actions to updates only on Friday afternoons AND with application/json, not appliction/json OR anything on Fridays. The elements in the now-unified nb field act as an AND, or as an OR.

My interpretation was always AND with some transactional guarantees implied as in do store/add then store/list ... and fail the whole thing if any of the steps fail. Which is why we have decided to limit to single capability, that way transactional guarantees became trivial.

Ah, I wasn't talking about the transactionality there.

With new format there is no implied order so I think it's reasonable to treat them as concurrent tasks unless they pipe into each other and even when they pipe it still reasonable to expect that task been piped to can fail.

Yes, only ordered via dependencies expressed in promises

As you're saying that you're forcing a single capability at the top level definitely disambiguates, too (though there are other tradeoffs like the number of signatures involved).

With a new format I have had been reviving queries with selectors approach, with above rational single capability per invocation is less of a concern. But it does bring single vs multiple aud concern into a focus. I'm not entirely sure where I'll land on this.

Thanks for being open to exploring the design space!

Gozala commented 1 year ago

I expect this UCAN to restrict actions to updates only on Friday afternoons AND with application/json, not appliction/json OR anything on Fridays. The elements in the now-unified nb field act as an AND, or as an OR.

I see what you mean here. That as not my interpretation though, I was thinking them as AND is sense of you can do this and that (set union) as opposed you can do it if you meet both this and that constraints (set intersection).

But clearly UCAN spec should specify this if it goes this route.