Open vasco-santos opened 4 months ago
I assume we are opting on the implementation to invoke blob/add second time when bytes were written to presigned URL. Can you confirm my assumption?
In terms of implementing /service/blob/accept
and specifically how does client signal that they've completed upload I suggest that we do whichever is easiest option from following two:
blob/add
invocation after they've done upload so we can perform a lookup.blob/add/poll
or very general like invocation/wake { task: CID }
.In the meantime I suggest we go with blob/* for new capabilities until invocation spec given the current implementation limitation with task/delegations. Just not sure if we also want a temporary prefix for service ones?
Spec is written with UCAN 1.0 format and assuming https://github.com/web3-storage/RFC/pull/12 however in terms of immediate implementation I suggest that we instead
blob/add
instead of /space/content/add/blob
web3.storage/blob/*
in place of /service/blob/
I suggest above because I think it would make more sense to transition to https://github.com/web3-storage/RFC/pull/12 once we have UCAN@1.0 implemented, because I suspect links to tasks vs invocations are going to be a pain to deal with otherwise. This will give us cleaner break.
I think we have a similar pattern as w3-filecoin. Here is where my mind is: a. blob/add queues for allocation if content is not already allocated in given space (resource of invocation) b. blobAllocateQueue consumer reads blob requests for allocation and self invokes (by w3s) blob/allocate , which if space has enough bytes available will insert in an allocationStore record with content+space.
I wrote some comments on how I imagine this could work, but I think we need to try to determine if it is in fact viable.
3. We store in our store/table (counterpart of allocationStore) this input. Looking at blob/allocate we are missing issuer and size in the args.
We have the taskCid though, which would allow us to find out those two, however I would prefer to make them just part of the param instead. Would you think this is reasonable? I honestly do not know why we store the issuer and the size . Size may be nice to have to query all items in space and get sum of their sizes without HEAD request, and we actually return it in the proposed receipt. So I think that I would def add it
We aren't missing size because Blob
is { size, content }
see
https://github.com/web3-storage/specs/blob/9cc1b039431ed2853b30e5874c5fe3a1acf7876e/w3-blob.md?plain=1#L263-L267 https://github.com/web3-storage/specs/blob/9cc1b039431ed2853b30e5874c5fe3a1acf7876e/w3-blob.md?plain=1#L188-L191
I wonder why do we care about issuer
though, I'll try to investigate. Without further context I would argue we should not care who the issuer is as long as space
has authorized them to do the invocation, but maybe I'm missing some context here ?
I also would suggest say that we can simply require that linked task cause
got embedded in the invocation, that way we'll have an issuer
info (at least before UCAN@1.0), but even that we should probably figure out why do we have it in first place.
I did little bit of archeological digging to try to find out where the issuer is coming from and it looks like it came from this PR https://github.com/web3-storage/w3infra/pull/48
Which in turn seems to have come from hugo's table https://github.com/web3-storage/w3up/blob/cc0f4f44c06a5884c6af50cf739dad6164d8e468/packages/access-api/migrations/0000_create_spaces_table.sql
So I have feeling that nothing really depends on this and we could probably drop the field, but we probably need to change types and see if there is anything that depends an that field
car/multihash checks on migration path a. we currently have carpark-prod-0 in R2/S3. When we perform a store/add we check bytes are there so that we ask users to write things. b. we now move to write things by multihash and I am wondering if you have any thoughts on deduping here? Could we simply test multihash with CAR codec and give location claim? c. actually same also happen on our storeTable today. We key it by the link…
I'm not sure I fully understand the question here. But I'll try my best to address what I think you're getting at
a. We can derive CAR cid from the blob multihash, simply by calling Link.create
with CAR.code and multihash. Se we could perform the same check.
b. Not sure I fully understand, but yeah we could check if we have CAR for that multihash and if we do we could issue location claim that strips of the CID header, basically taking a cid.multihash.bytes
c. I'm not entirely sure what are the assumptions in the store table, but we should:
RAW
codec instead of CAR if we go that route.Thanks for the follow ups @Gozala and looking into the issue thing. I think we are quite aligned on 1, 3 and 4 ✅ I will follow up on points for the remaining discussion items
on point 5 @Gozala your assumptions were kind of right on for the first part. So yes, my proposal was to do exactly that "simply by calling Link.create with CAR.code and multihash", but the general question/discussion topic is IF and HOW we want to deal with old world.
More specifically for both stores we have at the moment: carpark
(Bucket for car bytes keyed as link/link.car
) and store-table
. If we do not check on these with injecting the CAR codec to compute such link, we will receive a lot of repeated content. But also needing to deal with two versions of the world is hard and makes more expensive needing to verify for everything if we have in these tables/buckets new or old keys... So, we need to define what migration/rollout strategy we would like here. Probably worth to involve Hannah, so I will write a more specific issue for this
@Gozala regarding point 2, as well as your thoughts provided on https://github.com/web3-storage/w3up/pull/1340#discussion_r1536145607:
See
blob protocol
https://github.com/web3-storage/specs/pull/115blob/*
instead ofspace/content/*/blob
andweb3.storage/blob/*
instead ofservice/blob/*
for new capabilities until invocation spec given we are still not at UCAN 1.0 and links to tasks vs invocations are going to be a pain to deal with otherwise ✅