storacha-network / specs

🏅 Technical specifications for the w3up protocol stack
17 stars 0 forks source link

feat: w3filecoin spec with storefront aggregator broker and chain roles #71

Closed vasco-santos closed 11 months ago

vasco-santos commented 12 months ago

Based on my previous proposal to @Gozala that @Gozala covered in https://github.com/web3-storage/RFC/pull/2

Changes the aggregation spec introducing new capabilities and a new role. Main reasoning for this change is the new distinction between Storefront and Aggregator (previously had same identity), where a Storefront (web3.storage, nftstorage) will receive/compute Filecoin pieces (from received CAR files) and offer them to an Aggregator (w3filecoin). The aggregator's role is to put multiple pieces together to offer a large piece (aggregate) to a Broker (spade-proxy).

Per the above, the new invocations added are described in https://www.notion.so/2023-07-26-aggregation-protocol-contract-sign-and-report-API-12b4a7f0e92f4b789c331322e2629817

Renamed spec from aggregation to w3 filecoin given the context increased from aggregation to accommodate new requirements

Note that list, will be added later on

vasco-santos commented 11 months ago

@Gozala I added the intermediary queues resources as we discussed yesterday. Looking at it now, I think we are adding a lot of extra complexity that we could likely not have.

filecoin/add (with: did:key:zSpace ) can return the receipt with out { status: queued } with effect pointing to filecoin/add (with: did:web:web3.storage ). As we were doing in previous version, we create delegation and stick in first receipt the CID. Then when we dequeue, we create invocation and write receipt for it with result of queue consumer.

I think having the queue abstraction brings more complexity and in practise we do not get much from it. We only have 2 results to return, first is queued and second is accepted or error. If we would have the 3 invocations for such flow, we end up with

  1. Storefront receives filecoin/add (with: did:key:zSpace ), puts requested operation queued to be executed, creates delegation filecoin/add (with: did:key:zFilecoinQueue ) and writes receipt with delegation as effect together with out { status: queued }.
  2. Storefront dequeues the piece for verification, verifies it and if success writes receipt with effect to be filecoin/add (with: did:web:web3.storage)
  3. Storefront writes piece into DB and writes receipt with response.

The second receipt does not even has an output, and seems to introduce more complexity then real benefit. I know the idea would be to perform verification only when did is queue, but creating this exception for this queue does not seem so critical at spec level. This logic applies to all other queues. In short, I think we should drop this abstraction. It could work if we wanted to illustrate things getting out and into the queue again, but that will increase a lot number of receipts.

Let me know if you agree we should remove that intermediary step

vasco-santos commented 11 months ago

@Gozala thanks, for the feedback ❤️ I already adopted majority of it, leaving the rest for needed extra discussion

1. storefront => aggregator => dealer vs storefront => aggregator + storefront => dealer (via delegation from aggregator)

Looks like you see an implementation detail in different angle than me.

Storefront + group vs unify group on second invocation

This one is appealing option to not need extra invocation field. We can do it, if you feel strongly about it. Let me leave some reasoning that made me go that way that would be a drawback of moving into that direction:

Gozala commented 11 months ago

Looks like you see an implementation detail in different angle than me.

I think you may have misunderstood some of what I was getting at. Let my try it differently, I think aggregator is kind of proxy like an attorney. It acts on behalf of the storefront / broker tenant. It becomes even more clear when you consider signing pipeline, when aggregator submits a deal to the broker eventually broker will call the storefront and not broker to sign it, which is why I think it is crucial that relationship is between storefront <-> broker and there is no

flowchart LR
storefront --> aggregator
aggregator --> broker
broker --> storefront

What I mean in practice is that storefront registers with a broker (shares it's DID basically) as opposed to both storefront and aggregator registering with a broker.

You are correct that it would imply bit more configuring of aggregator, namely storefront would need to delegate { can: "deal/*", with: "did:web:web3.storage" } to the aggregator, which can simply be a env variable. I know it feels like an extra hoop, but I think we'll have to get better at doing this somehow as we keep seeing this pattern over and over and can't keep working around it forever.

Also note that we can even avoid having to provision aggregator with delegation and instead include it the proofs of every piece/add delegation that way aggregator will not require provisioning.

I do not think folding aggregator is better option, it just tangles more things while I'd like to further detangle them.

Gozala commented 11 months ago
  • Dealer requires tenantId for the deal, which in my head is the storefront DID (or a mapping of it that spade-proxy can do). By merging them together we would need to have knowledge of splitting that string on the other end of the pipeline, which does not seem so desirable to me.

This is indeed a drawback. Maybe we can meet in the middle ? I basically don't want a storefront to set that field because it can impersonate (intentionally or due to bug) another tenant, unless we check that it is the same as with in which case why would you even set the field.

That said we could instead have remove (ignore) tenant in invocation from storefront and then copy with when propagating that request. Only drawback with this was here that we basically overload capability by a with field, which tends to be symptom of the bad design. Perhaps idea of using same can field for both enqueue and dequeue is flawed and we should simply create a separate capability.

I don't favor strongly either of the three options here, so I think you should decide. If you do keep tenant field, I would encourage to make it clear that for invocation with storefront is obsolete.

  • I was seeing the group as actually the only thing we need to derive the grouping ID for queue messages. If it is a DID that would be fine, if we intend to use it as something like free only we definitely need to merge things.

My point is that could lead to say web3.storage pieces going through nft.storage deal pipeline because uniqueness across uncoordinated storefronts can not be achieved. Which is why I suggest to achieve it effectively by ${tenant}/${group} that way colliding group names across tenants would not lead to invalid behavior.

metrics are easier to compute to calculate numbers of each group/storefront if things don't mutate

I'm not sure I understand this point. What I was suggesting is to make tenant implicitly equal to with field on invocations from storefront.

After writing all this I'm starting to be slightly in favor of just having a separate capabilities as only drawback with them is having to come up with naming. In fact even the handlers in your implementation would be more cleaner as they won't have two separate code paths.

Gozala commented 11 months ago

I think the best solution to align without getting into complexity of delegation is to just rename the aggregator and the invocations (again…).

I do not think renaming would address my concern, which is broker/dealer been aware of another party. Triangular relationship is usually introduces coordination complexity

flowchart LR
storefront --> aggregator
aggregator --> broker
broker --> storefront

In my experience it's always best to have linear relationship which is either through routing

flowchart LR
storefront <--> aggregator
aggregator <--> broker

or encapsulation

flowchart LR
storefront <-- aggregator --> broker
vasco-santos commented 11 months ago

@Gozala regarding point 2, I think we will need to revert to having 2 capabilities again. I can do it tomorrow, I can suggest we call them x/add and x/queue where /queue has the with of the issuer that we can set as storefront in nb of /queue that is self issued by service

vasco-santos commented 11 months ago

@Gozala regarding point 1, there are still some details that I do not understand where you stand. I understand the benefits of the solution you suggest here. And there are definitely some. However, I feel there are still some gaps.

Considering storefront delegates to aggregator capability to add aggregate to dealer,

Also not delegating here, does not stricly mean that storefront can't delegate to sign, which in my view makes sense here. This is actually another reason that Aggregator is probably not the best naming.

Gozala commented 11 months ago

@Gozala regarding point 1, there are still some details that I do not understand where you stand. I understand the benefits of the solution you suggest here. And there are definitely some. However, I feel there are still some gaps.

I'll use name "agency" which is I got from Riba in the past to refer to currently "aggregator" and perhaps "filecoin-engine" in the future, because I think it conveys the mental model better.

Considering storefront delegates to aggregator capability to add aggregate to dealer,

  • who is the actor responsible for keeping track of deal state in the end?

Agency takes pieces and is responsible for getting it into filecoin deal(s). This implies that it receives pieces from Storefront and either reports a problem (if it is faulty) or reports success providing inclusion proof and deal info.

All of the state management including retries is taken care of by Agency.

Agency may expose intermediate states through chained effect receipts, however how much of it is exposed vs kept off receipt chain is up to what would make most sense from operational standpoint. Exposing more would improve system introspection at the cost of an overhead, so we'll have to find a right balance between two and possibly adjust it over time.

For Agency to perform this task it will need signed permission from the Storefront client (Similar to how attorney can represent a client after they signed a contract). In our case I think that boils down to

  1. Capability to submit aggregates on Storefront's behalf.
  2. Capability to sign deals with Storefront's wallet.
  3. Capability to grant read access for added pieces.

With above capabilities Agency should be able to take care of getting a piece(s) into the filecoin. Once that's done it's job is complete and takes no responsibility for tracking deal renewals or reporting them. At this point storefront can hire "Chain Tracker" to get updates on deal updates (push model) or alternatively use "Chain Tracker" to query aggregate deal state (pull model).

  • If we delegate, we either need to have storefront to also delegate capability to check chain-tracker or things start to get inconsistent where there are other "three actors jumps". This feels that things will just get more and more complicated for each Storefront.

I'm not sure sure if above makes it bit more clear on this side. In my head how tracker works is not entirely clear yet. It is also not clear to me if querying tracker strictly requires permissions given that it is essentially an index over the public chain. Still if Agency needs to interact with tracker that does not seem problematic to me, it can interact with various actors to get it's job done.

Real question here seems to be should Agency query the tracker on behalf of the Storefront or should it query it on it's own behalf. I think either option is fine here because unlike the case with Dealer it is not important to capture that operation occurs on behalf of the Storefront.

Yet if we decide that such query should happen on behalf of the Storefront that should not be a big deal either, Storefront would just need to add another capability to the above set.

  • if we do not delegate, and see the aggregator (or better filecoin-engine) to be a service that storefront uses to put pieces into filecoin under a SLA, it would mean filecoin-engine would to for storefront what storefront does to users of web3.storage. Facilitate making things available in SPs. In other words, storefront does not care on all the details of aggregation, SPs matching and so on, just about making things at some point available in SPs.

I do not understand why you think delegation here changes this. I also see the same picture it just fact of registering Storefront with Agency is signing a delegation (signing a contract with an agency).

By delegating, storefront is now aware of all the details behind putting something in SPs (which will change in the future).

You are correct to point out that Storefront becomes aware that Agency will interact with a Dealer on their behalf. It is a tradeoff in comparison to black box model where Storefront knows nothing about what Agency does to get things into Filecoin. Yet, given that Sterofront still needs to sign deals and make pieces readable it is impossible to achieve black box model, which is why I prefer transparent model over one that is neither.

I also want to point out that we're defining protocol with capabilities so even if in the future we change Agency pipeline so it no longer needs to work with Dealers we could still use same protocol meaning hand set of signed capabilities to the Dealer to do their job.

  • who keeps all the persistent state of aggregates and inclusion?

I think we have covered this to some degree. I imagine that invocation (by Storefront) that adds a piece either has an effect that succeeds once piece makes into a filecoin deal, providing details about inclusion proof etc.. or fails if problem is discovered with a piece.

Agency may retry (several times) same piece if aggregate failed due to errors not caused by it but thous details do not need to be captured in the invocation trace.

  • If is a delegation of capability it also feels weird for me that then is not the storefront who keeps all the state (and therefore responsible for keeping the aggregate and inclusion mappings)

    • who is going to trigger alerts?
  • will everything handle in the hands of the storefront? I see filecoin-engine as a service that will guarantee things make it to Filecoin with some SLAs for their customers (storefronts).

I do too

Also not delegating here, does not stricly mean that storefront can't delegate to sign, which in my view makes sense here. This is actually another reason that Aggregator is probably not the best naming.

đź‘Ť