Closed olizilla closed 1 year ago
If we focus the schema change to just include the properties we need to implement the capabilities, and updating names to be consistent with our new terminology then we end up with:
store table
ucan terminology flavour - deliberately stick to the language of the invocation we are capturing
issuer | resource | link | size | origin? | ucanCID | insertedAt |
---|---|---|---|---|---|---|
did:key:space |
did:key:space |
bagy...1 |
101 |
bagy...0 |
baf...ucan |
2022-11-23T... |
string | string | string | number | string | string | string (ISO 8601) |
agent DID | space DID | stored CAR CID | CAR size bytes | prev shard CAR CID | invocation CID | wen db record |
domain mapped flavour - name columns after how we expect them to be commonly used.
agentDID | spaceDID | carCID | size | origin? | ucanCID | insertedAt |
---|---|---|---|---|---|---|
did:key:space |
did:key:space |
bagy...1 |
101 |
bagy...0 |
baf...ucan |
2022-11-23T... |
string | string | string | number | string | string | string (ISO 8601) |
agent DID | space DID | stored CAR CID | CAR size bytes | prev shard CAR CID | invocation CID | wen db record |
ucanCID
is the CID for the complete, validated UCAN invocation that was sent to us and caused this row to be inserted. The current proposal is to store the bytes of that UCAN in s3 keyed by it's (raw
) CID.
Did some looking around on how we might try to aggregate the entries of upload-api 'store' entries into a per-customer sum of total amount stored that we can use to calculate billing. Especially given that upload-api currently uses DynamoDB and that DynamoDB querying doesn't inherently support aggregation queries like sum , it's not as easy as if we used a different datastore. – https://filecoinproject.slack.com/archives/C02BZPRS9HP/p1669074957770049
References
wrt the thing I mentioned about billing. From what I grok about the way the store protocol works, I think the aggregate we'll probably need is
select sum(bytes) from uploads group by space
)FWIW here is the 'usage reports' api stripe has, which we'll want to to update if we use stripe as the usage-based billing provider.
The proposed dynamodb schema:
/** @type import('@serverless-stack/resources').TableProps */
export const storeTableProps = {
fields: {
space: 'string', // `did:key:space`
car: 'string', // `bagy...1`
size: 'number', // `101`
origin: 'string', // `bagy...0` (prev CAR CID. optional)
agent: 'string', // `did:key:agent` (issuer of ucan)
ucan: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + car must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'car' },
}
/** @type import('@serverless-stack/resources').TableProps */
export const uploadTableProps = {
fields: {
space: 'string', // `did:key:space`
sk: 'string', // `root#shard` + space must be unique for dynamo index constraint
root: 'string', // `baf...x`
shard: 'string', // `bagy...1
agent: 'string', // `did:key:agent` (issuer of ucan)
ucan: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + sk must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'sk' },
}
uploaderDID
becomes space
upload-client
, our capability properties and the access-api
data modelagent
(who did it), ucan
(CID to link to bytes of the actual invocation), and insertedAt
come last.insertedAt
as all we are really capturing here is the time at which we inserted the doc into the db.Few notes:
agentDID
is better than issuer
because there’s no way to enforce that invariant and users can issue invocations with spaceDID
instead.
ucanCID
/ ucan
shod be called invocation instead, because that would describe semantics, while UCAN describes formatting.car
as field in capabilities is because it again describes format and not semantics. We may also add support for different formats in the future so calling things by format isn’t ideal IMO. I would use same name as in capability, if that name is bad we probably should rename that in capability as well.I think we need at least one more field for provider DID...
Yep! I'm gonna leave that out from this iteration, and add it in once we tackle providers
. We can add later without needing a migration.
I’m not sure agentDID is better than issuer because there’s no way to enforce that invariant
I agree. I'm happy to call it issuer
I’m also not sure why do we even want to store this (the issuer) outside of UCAN itself
I was assuming some need to show "which identity actually invoked the thing" but we could drop it for now.
I think ucanCID / ucan should be called invocation instead
I can deal with calling it invocation
. ucan
is so short and neat tho.
Reason I chose not to use car as field in capabilities is because it again describes format and not semantics. We may also add support for different formats in the future so calling things by format isn’t ideal IMO. I would use same name as in capability, if that name is bad we probably should rename that in capability as well.
I think folks are struggling with link
as the term for a CIDlike thing. It's very very generic, where CID was clear to folks what they were dealing with. A case could be made for renaming it, but I don't know if there is appetite for more renames.
However, I'm generally in favour of having the db labels match the labels in the invocation wherever sensible. I'll rename it link
here and we can see if folks want to discuss if link
should change in the capability, and I'll adopt the outcome of that, if/when it happens.
Do we store invocations that were proved invalid ? Or denied because storage limits?
not currently but i think we should track that somewhere. Do you have a use-case in mind? My assumption we'd want this for at least tracking metrics, and debugging. Would an aggregated count here be sufficient or would we want dump them in a bucket, or track them in a full on db table?
We should also store response CIDs somewhere.
Can do! Is the use-case here analytics and debugging? or something more?
We may get valid requests to store, but user may never actually upload to S3 don’t know how and where to track this, but we should somewhere. Also pre-signed URLs may expire and probably we should note when / if it does
ACK. Yes, tracking "when does the ObjectCreated event occur in s3" is on my radar. Tracking the expire time for the URL is a good idea!
That gives us an updated dynamodb schema like so:
/** @type TableProps */
export const storeTableProps = {
fields: {
space: 'string', // `did:key:space`
link: 'string', // `bagy...1`
size: 'number', // `101`
origin: 'string', // `bagy...0` (prev CAR CID. optional)
issuer: 'string', // `did:key:agent` (issuer of ucan)
invocation: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + car must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'link' },
}
/** @type TableProps */
export const uploadTableProps = {
fields: {
space: 'string', // `did:key:space`
sk: 'string', // `root#shard` + space must be unique for dynamo index constraint
root: 'string', // `baf...x`
shard: 'string', // `bagy...1
issuer: 'string', // `did:key:agent` (issuer of ucan)
invocation: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + sk must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'sk' },
}
I think folks are struggling with link as the term for a CIDlike thing. It's very very generic, where CID was clear to folks what they were dealing with. A case could be made for renaming it, but I don't know if there is appetite for more renames.
However, I'm generally in favour of having the db labels match the labels in the invocation wherever sensible. I'll rename it link here and we can see if folks want to discuss if link should change in the capability, and I'll adopt the outcome of that, if/when it happens.
I agree link
is not a great name & in fact fails to describe semantics. Maybe content
is better ?
we also have an issue on file suggesting that small cars should be inlined instead, in which case link
(and cid
) would be even less meaningful.
Yep! I'm gonna leave that out from this iteration, and add it in once we tackle providers. We can add later without needing a migration.
Sounds good! Just to be clear though, it will affect primary index & we bill who hired a provider.
I can deal with calling it invocation. ucan is so short and neat tho.
I think alternatively it could be called source
which is bit shorter & still meaningful.
In regards to other info I brought up. I don’t have specific use cases, it’s just all these invocations have certain state that we don’t seem to capture. It would be difficult to tell what the state of the individual invocation is, or how did invocation progressed through the system.
we also wanted to include some info in the receipt so introspection seems important
Currently, uploaderDID tracks the resource that an upload is invoked
with
. When we are using spaces, thewith
is the space DID. so UploaderDID is the space DID which feels wrong.see: https://github.com/web3-storage/upload-api/blob/75f0b5327d3dc2cface553bf60f9e9ebc17598c9/api/test/service/store.test.js#L85-L86