Roundabout location claims URL matching

Problem statement

The goal of Roundabout is to redirect a given request to a presigned URL where the requested content can be downloaded from. Until this moment, we have been operating a single bucket within w3up context (carpark-prod-0), and also supported a custom bucket via query parameters for other use cases (specifically dagcargo related).

As we move to decentralize all the things with content claims, we want to rely on content claims to know where content is, instead of either assuming a specific bucket or leaving it to the application layer to see where content lives.

It is critical to mention that Roundabout needs to know the destination bucket present in the claims in order to create the presigned URL. To create these, roundabout needs to have the proper keys to be able to read and create presigned URLs for requested content locations. Therefore, Roundabout will always only be used internally within web3.storage tooling and not directly via users, for use cases like SPs reading data at rest, or retrieval clients like Lassie.

Location claims usage

Location claims claim that a given CID can be retrieved from a given location URL. Currently, web3.storage operates a few buckets with several providers and locations. For these buckets, roundabout should be able to provide presigned URLs for R2 destination buckets, if requested content is available there.

Two different aspects need to be taken into account:

Roundabout will need to have knowledge about location claims R2 domain so that they can be mapped to the bucket names where this content is.
A transition period MAY need to be taken into account. We currently write content into carpark-prod-0 S3 backed bucket, which then is replicated to a carpark-prod-0 R2 backed bucket. Until we get to a write to R2 directly setup, replicator MAY need to write a claim when it replicates content, or roundabout will need to know

Note that while this focus into the context of roundabout, there are multiple other places where this discussion is currently critical.

Location URIs

Defining how the clients will write claims for these target locations is critical to have a mapping of these locations to the buckets we want Roundabout to support and have the keys for.

Typically, objects in S3 buckets can be located via:

S3 URI (e.g. s3://<BUCKET_NAME>/<CID>/<CID>.car)
Object URL (e.g. https://<BUCKET_NAME>.s3.<AWS_REGION>.amazonaws.com/<CID>/<CID>.car)
- can be used to fetch the bytes by any HTTP client if bucket is public

However, R2 object locations have different patterns, instead of following S3 pattern. They can be:

Public Object URL (e.g. https://pub-<INTERNAL_R2_BUCKET_IDENTIFIER>.r2.dev/<CID>/<CID>.car)
- can be used to fetch the bytes by any HTTP client, if bucket is public and heavily rate limited (R2 docs state that such URLs should only be used for dev)
Custom domain object URL (e.g. https://<CUSTOM_DOMAIN>.web3.storage/<CID>/<CID>.car)
- can be used to fetch the bytes by any HTTP client, if custom domain is configured in R2
Presigned URL (e.g. https://<ACCOUNT_ID>.r2.cloudflarestorage.com/<BUCKET_NAME>/<CID>/<CID>.car?...)
- can be used to fetch the bytes by any HTTP client, if has the keys + not expired

The main pattern that we can identify is to have URLs that can be accessed by any HTTP client. Except for S3 URIs, given the correct setup/keys is available, all other URLs are fetch'able.

Note that a claim may not be readable from all actors, as some may be behind a given set of permissions/capabilities

Client claims

Nothing prevents us from claiming multiple location URIs for a given content, however we may

TBD

storacha-network / w3infra