feat: w3compute protocol

vasco-santos commented 5 months ago

Adds compute/* protocol allowing an implementer to perform simple computations over data on behalf of an issuer. This aims to enable clients to hire compute services to delegate some work, as well as for w3-up platform to hire third party compute services to verify client side offered computations if desirable.

Main goal with initial proposal is to compute PieceCidV2. We want to move from current state where we have a centralized PieceCid computation on Bucket Event, to something that is not highly coupled with both the data write trigger and the location. However, this can also be applied to other future computations like compute indexes for given content.

Of course, an implementer can have a custom resolution implementation for:

finding the data close to where it runs, such as prefer a Cloudflare location if running in CF Workers
finding the cheapest way to read the data (e.g. try Roundabout)

Note that the discovery process by actors looking for services providing given computations is for now out of scope of this spec. But, this is something I would like to see later on, as this could open the doors of our protocol to run in multiple places, such as Filecoin Station. Computations are the easiest path to try decentralizing the service to run anywhere.

olizilla commented 5 months ago

Can you add more context on the cost of computing the PieceCID? The bias I have that this spec needs to add some words on to overcome is that I think it is very cheap to calculate the PieceCID if you have the CAR bytes.

I believe (unverified, bias alert) that it's more expensive to move the CAR than it is to calculate the PieceCID. I also have reservations about the overhead of creating and signing additional UCANs vs just calculating the piece cid locally.

This spec would be more compelling if it made the case about "it's very important to have a trusted PieceCID for each CAR early in the pipeline, as it's more expensive if we create a ~32GiB aggregate and then find that one of the PieceCIDs was wrong." Perhaps a client has to calculate it themselves and then invoke compute/piececid to get a second opinion and provide both signatures as evidence?

Can we calculate that the sum of the costs of repeated pieceCID calculations is less than re-building an aggregate if we find a bad one?

vasco-santos commented 5 months ago

Can you add more context on the cost of computing the PieceCID? The bias I have that this spec needs to add some words on to overcome is that I think it is very cheap to calculate the PieceCID if you have the CAR bytes.

I believe (unverified, bias alert) that it's more expensive to move the CAR than it is to calculate the PieceCID. I also have reservations about the overhead of creating and signing additional UCANs vs just calculating the piece cid locally.

As you say, it is relatively cheap to calculate the PieceCID if you have the CAR bytes, and likely more expensive to move the CAR bytes. Calculating the Piece CID locally already happens and will continue to happen, being this the trigger to kick the pipeline. As w3-filecoin spec mentions, MAY compute the PieceCID for validation, or may not, that is an implementation detail that is not required.

This spec would be more compelling if it made the case about "it's very important to have a trusted PieceCID for each CAR early in the pipeline, as it's more expensive if we create a ~32GiB aggregate and then find that one of the PieceCIDs was wrong."

This is a protocol spec, more specifically on how to ask a third party service for computations. As a first provider capability, it can execute pieceCid computations on behalf of others. It was not written towards the direction of convincing any implementer to use it, instead of whatever they may do. Therefore, I would say this is out of consideration of a spec, but as an implementation documentation. What do you think?

For completeness of answer (which is actually also present in the implementation proposal document) previously shared, the intention today is to have clients to submit Piece computation and to have us Storefront to validate it. It already indirectly happens today, as the user submission of Piece is a NOP dettached from the flow of Bucket event to compute PieceCID. Moreover, today we decided that Storefront (w3up) MUST validate pieces, which may change in the future, but current product requirements together with using Spade and typical SPs flow, make it essential to have validation process. The main reasons are:

Product has a strong top level requirement for SLA for an ingested piece of content to land into a SP in less than 72H
Spade today has no SLA guarantees
We rely on aggregation per FIP 0069. In short this means that if we get into the Pipeline a wrong pieceCid for a given contentCid, all the 32Gb aggregate will be invalid. All the "good" pieces get delayed (may trigger the SLA), and need to go back into the aggregation queue.
There is no spec'ed Report API from SPs for when a given Aggregate has a problem. There are some alternatives that some SPs use, but they are not required today. This makes w3filecoin pipeline today completely blinded on why a given Aggregate may fail, except from an alert if it did not get into chain until some alerting thresholds. It is also not possible to query for a state, or error case.
Per the above limitations, w3filecoin still does not have an implementation for retries.
With current throughput of ~3 hours per created Aggregate, it is an easy attack vector for a bad actor to submit small invalid Pieces each couple of hours and completely stall the pipeline

Perhaps a client has to calculate it themselves and then invoke compute/piececid to get a second opinion and provide both signatures as evidence?

It is not a client problem if their claim is bad. They can just hire a malicious computer anyway... Must be the service to decide who they trust, if the service can be penalized by malicious content being broadcasted. As previously stated, client will already compute PieceCid themselves on upload when they have the bytes. They will not send the bytes with the filecoin/offer, just pointers (CIDs) to the data. Storefront MAY decide to grab those and validate them, hire a third party validator, or even run a validator ourselves.

Can we calculate that the sum of the costs of repeated pieceCID calculations is less than re-building an aggregate if we find a bad one

I think reasons were clear before, but in short:

there is no visibility today on aggregates failed, or why they failed.
Waiting for a timeout on 72H to look into it, will break SLA
More than breaking SLA there is no info on why it failed. So download the 32Gb, validate each single piece. Put everything back into the aggregation queue again and hope there won't be one more bad PieceCid

Gozala commented 5 months ago

Here are few links to some prior ideas on the similar subject I had in the past

https://gozala.io/workspace/#/page/w3-machine https://hackmd.io/@gozala/invocation-router https://github.com/web3-storage/RFC/pull/3/files

I think this doc is out of date, but IPVM had some relevant work also https://github.com/ipvm-wg/workflow/blob/v0.1/README.md

storacha-network / specs

feat: w3compute protocol #110