storacha-network / specs

🏅 Technical specifications for the w3up protocol stack
17 stars 0 forks source link

feat: w3compute protocol #110

Open vasco-santos opened 5 months ago

vasco-santos commented 5 months ago

Adds compute/* protocol allowing an implementer to perform simple computations over data on behalf of an issuer. This aims to enable clients to hire compute services to delegate some work, as well as for w3-up platform to hire third party compute services to verify client side offered computations if desirable.

Main goal with initial proposal is to compute PieceCidV2. We want to move from current state where we have a centralized PieceCid computation on Bucket Event, to something that is not highly coupled with both the data write trigger and the location. However, this can also be applied to other future computations like compute indexes for given content.

Of course, an implementer can have a custom resolution implementation for:

Note that the discovery process by actors looking for services providing given computations is for now out of scope of this spec. But, this is something I would like to see later on, as this could open the doors of our protocol to run in multiple places, such as Filecoin Station. Computations are the easiest path to try decentralizing the service to run anywhere.

olizilla commented 5 months ago

Can you add more context on the cost of computing the PieceCID? The bias I have that this spec needs to add some words on to overcome is that I think it is very cheap to calculate the PieceCID if you have the CAR bytes.

I believe (unverified, bias alert) that it's more expensive to move the CAR than it is to calculate the PieceCID. I also have reservations about the overhead of creating and signing additional UCANs vs just calculating the piece cid locally.

This spec would be more compelling if it made the case about "it's very important to have a trusted PieceCID for each CAR early in the pipeline, as it's more expensive if we create a ~32GiB aggregate and then find that one of the PieceCIDs was wrong." Perhaps a client has to calculate it themselves and then invoke compute/piececid to get a second opinion and provide both signatures as evidence?

Can we calculate that the sum of the costs of repeated pieceCID calculations is less than re-building an aggregate if we find a bad one?

vasco-santos commented 5 months ago

Can you add more context on the cost of computing the PieceCID? The bias I have that this spec needs to add some words on to overcome is that I think it is very cheap to calculate the PieceCID if you have the CAR bytes.

I believe (unverified, bias alert) that it's more expensive to move the CAR than it is to calculate the PieceCID. I also have reservations about the overhead of creating and signing additional UCANs vs just calculating the piece cid locally.

As you say, it is relatively cheap to calculate the PieceCID if you have the CAR bytes, and likely more expensive to move the CAR bytes. Calculating the Piece CID locally already happens and will continue to happen, being this the trigger to kick the pipeline. As w3-filecoin spec mentions, MAY compute the PieceCID for validation, or may not, that is an implementation detail that is not required.

This spec would be more compelling if it made the case about "it's very important to have a trusted PieceCID for each CAR early in the pipeline, as it's more expensive if we create a ~32GiB aggregate and then find that one of the PieceCIDs was wrong."

This is a protocol spec, more specifically on how to ask a third party service for computations. As a first provider capability, it can execute pieceCid computations on behalf of others. It was not written towards the direction of convincing any implementer to use it, instead of whatever they may do. Therefore, I would say this is out of consideration of a spec, but as an implementation documentation. What do you think?

For completeness of answer (which is actually also present in the implementation proposal document) previously shared, the intention today is to have clients to submit Piece computation and to have us Storefront to validate it. It already indirectly happens today, as the user submission of Piece is a NOP dettached from the flow of Bucket event to compute PieceCID. Moreover, today we decided that Storefront (w3up) MUST validate pieces, which may change in the future, but current product requirements together with using Spade and typical SPs flow, make it essential to have validation process. The main reasons are:

Perhaps a client has to calculate it themselves and then invoke compute/piececid to get a second opinion and provide both signatures as evidence?

It is not a client problem if their claim is bad. They can just hire a malicious computer anyway... Must be the service to decide who they trust, if the service can be penalized by malicious content being broadcasted. As previously stated, client will already compute PieceCid themselves on upload when they have the bytes. They will not send the bytes with the filecoin/offer, just pointers (CIDs) to the data. Storefront MAY decide to grab those and validate them, hire a third party validator, or even run a validator ourselves.

Can we calculate that the sum of the costs of repeated pieceCID calculations is less than re-building an aggregate if we find a bad one

I think reasons were clear before, but in short:

Gozala commented 5 months ago

Here are few links to some prior ideas on the similar subject I had in the past

https://gozala.io/workspace/#/page/w3-machine https://hackmd.io/@gozala/invocation-router https://github.com/web3-storage/RFC/pull/3/files

I think this doc is out of date, but IPVM had some relevant work also https://github.com/ipvm-wg/workflow/blob/v0.1/README.md