Outstanding incompatibilities between UCAN x CACAO data models

ucan-wg / ucan-cacao

0 stars 0 forks source link

Outstanding incompatibilities between UCAN x CACAO data models #2

Open Gozala opened 1 year ago

Gozala commented 1 year ago

I had a good chat with @oed last week about desire to reconcile data model differences so we could have a single IPLD schema as opposed to a bridge between two. @oed looked at UCAN-IPLD spec shared some impressions that I'd like to surface here so we could collectively work out solutions to:

So there are a few properties missing to make this work with a SIWE message:

version String

nonce String

iat String // RFC3339 date-time =issued-at

nbf optional String // RFC3339 date-time =not-before

exp optional String // RFC3339 date-time = expiration-time

statement optional String // =statement

requestId optional String // =request-id

resources optional [ String ] // =resources as URIs

Not sure how one could represent these within this schema Maybe the fct property could be used? exp Int is also not compatible with SIWE because it is undefined (or rather defined as an ISO string So the idea with 2.1 Principal is that you would register new multicode for every DID method? Imo it would make sense for DID Key to also have a multicode for symmertry?

Really like the 2.5 Signature section and it makes sense for how to create and verify the signature once you have the bytestring to sign over. I don't think we can represent the way to go from the entire payload to the thing to sign over however. E.g. 0xd0e7 is a secp256k1 signature over a JWT specifically, not any other secpk1 sigs. Another thing worth noting is that 0xd0e7 specifies only the signature algorithm used, while 0xd191 specify signature algo, hashing algo, and message prefix.

I see that we have v String I wonder if this can be used to determine the schema used by the block and how to convert it to the bytestring we sign over? So ucv1.0.0 uses this schema and siwe uses another schema?

I feel like 2.1 and 2.5 likely makes sense as their own specs eventually

Thought on how we could create a sort of shared approach to IPLD schema: https://gist.github.com/oed/cbb76aa60f919bddfafe5ff879f3dfe8 Main idea would be to use representation inline and discriminantKey "v"

Gozala commented 1 year ago

So there are a few properties missing to make this work with a SIWE message:

version String

I would expect that v field in the data model should be it, but maybe I'm missing something https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L31

nonce String

There is a nnc field which is an optional nonce https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L44

iat String // RFC3339 date-time =issued-at

This does not exist in UCANs, but perhaps we could create an optional field for it if it's important in CACAO use cases. That said it would be good to provide some context how is this used and what's the purpose.

nbf optional String // RFC3339 date-time =not-before

There is an nbf field in UCANs, it's an int but on could simply translate that into date time and back no ? https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L45

exp optional String // RFC3339 date-time = expiration-time

Same as nbf here, with main difference that it could be null to explicitly make delegation non-expiring. (I am also realizing that schema does not seem to reflect null case). https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L41

statement optional String // =statement

requestId optional String // =request-id

I have no idea what those are, could you please provide some context for us ?

resources optional [ String ] // =resources as URIs

I believe that maps to capabilities in UCANs that live under att field https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L37

However unlike CACAO, in UCANs it's resource+action you can perform with it. More specifically resource will be in with field while action will be in can field. UCANs also allow open ended nb field to capture domain specific constraints when needed.

I'm not sure what makes sense for CACAO here. I think {with: uri, can: *} would be roughly equivalent of what resources are in CACAO. If this is not acceptable, perhaps we could discuss what would be a reasonable compromise that works for both CACAO and UCANs.

exp Int is also not compatible with SIWE because it is undefined (or rather defined as an ISO string

I imagine one could map time to Int and back to make this work, but if there is a reason this won't work please tell us why ?

So the idea with 2.1 Principal is that you would register new multicode for every DID method? Imo it would make sense for DID Key to also have a multicode for symmertry?

No idea is that every DID can be represented via last DID variant of the the union and even then would be more compact than DID string https://github.com/ucan-wg/ucan-ipld/blob/18d508456e1b6e2e385beb107a3c62124a720df7/README.md?plain=1#L71-L74

Commonly used DIDs could be further optimized by allocating varint for them like we do with bunch of did:key variants.

This way implementation can start simply with DID variant and propose new, more compact variant once it used widely enough to make sense. Given that UCAN spec is basically did:key we've defined those as specific variants to reduce amount of bytes used to represent them.

Really like the 2.5 Signature section and it makes sense for how to create and verify the signature once you have the bytestring to sign over. I don't think we can represent the way to go from the entire payload to the thing to sign over however. E.g. 0xd0e7 is a secp256k1 signature over a JWT specifically, not any other secpk1 sigs. Another thing worth noting is that 0xd0e7 specifies only the signature algorithm used, while 0xd191 specify signature algo, hashing algo, and message prefix.

That is a good point. In fact I'm not even sure what Fission's wallet auth demo signed JWT payload or a some other encoding of the payload.

I think we could either:

Add another varint into Signature to specify what the payload is.
Simply expand the union type to cover non JWT payload cases.
Have some other signaling mechanism

@expede do you have thoughts or opinions on this ?

I personally think that we're making much bigger deal of what we're signing that it's odd to be. If there is a canonical way by which we can derive payload from the data model we could not have to care about this too much. That said ideally we won't have too many of the variants and both JWT and IPLD representation should be able to support both.

I see that we have v String I wonder if this can be used to determine the schema used by the block and how to convert it to the bytestring we sign over? So ucv1.0.0 uses this schema and siwe uses another schema?

I would really love if we could simply have single schema so we could all benefit from the tools we build. Otherwise I fear we'll end up with second class support across UCAN x CACAO worlds.

I feel like 2.1 and 2.5 likely makes sense as their own specs eventually

I definitely want to break out 2.5 into it's own varsig spec (because multisig means different thing), but it had been not the highest priority for me. I'm sure it will happen sooner or later.

As of 2.1 it did came up as well. I'd +1 that effort, that said I really don't have energy to drive did in multiformats effort as I fear it's a lot of work and constantly moving target. Perhaps in year or two it would be far more clear cut. Not to discourage anyone from going after this sooner however.

Gozala commented 1 year ago

Thought on how we could create a sort of shared approach to IPLD schema: https://gist.github.com/oed/cbb76aa60f919bddfafe5ff879f3dfe8 Main idea would be to use representation inline and discriminantKey "v"

To keep things simple I'm going to copy and paste schema from that the gist here


 type CACAO union {
  | UCAN "ucv1.0.0"
  | SIWx "caip122"
  | EthNFT "ethnft"
} representation inline {
  discriminantKey "v"
}

I feel like this does not really address differences. Specifically it is not clear how e.g. how one would validate UCAN with SIWx proof.

type UCAN struct {
  iss Principal
  aud Principal
  s Signature

  att [Capability]
  -- All proofs are links, however you could still inline proof
  -- by using CID with identity hashing algorithm
  prf [&UCAN]
  exp Int

  fct [Fact]
  nnc optional String
  nbf optional Int
} representation map {
  field fct default []
  field prf default []
}

type SIWx struct {
  iss Principal
  aud Principal
  s Signature

  version String
  nonce String
  iat String // RFC3339 date-time =issued-at
  nbf optional String // RFC3339 date-time =not-before
  exp optional String // RFC3339 date-time = expiration-time
  statement optional String // =statement
  requestId optional String // =request-id
  resources optional [ String ] // =resources as URIs
}

type EthNFT struct {
  iss Principal
  aud Principal

  proof MerkleStateProof
  blockNumber Integer
}

I'm totally lost what EthNFT imply here ? What is the resource in CACAO terms or a capability in UCAN terms ? For what it's worth ucan-mailto uses fct field to embed DKIM proof. It seems to me that similarily EthNFT could be embedded in the fct as opposed to having entirely different structure.

Perhaps there is good rational for doing it this way, but it would help to share it so we could compare pros and cons. On related note I have also considered using DKIM model as a proof, however decided against it in favor of fct field because that keeps core spec simple and allows us to use bunch of other things that UCANs provide without having to map those out in some way into the data model.

Gozala commented 1 year ago

Overal I am under impression that if we could work out following differences we could simply converge onto one schema:

[ ] Timestamp in UCAN spec are encoded as Ints while they are encoded as Strings in CACAO. I would argue in favor of ints because they are more compact. I can't see why CACAO could not use Int to encode date-time, @oed if you disagree please elaborate so we can get on the same page on this.
[ ] iat does not exist in UCANs, I'm not sure why are they necessary in CACAO, but perhaps we could meet in the middle and make them optional in both ?
- UCANs don't have iat because there is no way to verify that and validity time bounds are determined by nbf - exp.
- CACAO with UCAN proof would somehow need to address lack of iat, perhaps that is how it could treat optional iat more generally ?
- It is also worth calling out that iat could be simply added to fct in UCANs, but than again it's not clear to me how this information is used.
[ ] Introducing non JWT payload encoding. I think this is a biggest elephant in the room.
- While I wish I could simply sign CBOR I do recognize a value of JWT interop and formatting CBOR into JWT for signing seemed like a reasonable compromise.
- @oed raised some issues with the fact that Wallets (at least currently) provide bad experience if payload is JWT, which is why I believe CACAO finds it unacceptable.
- I would personally be in favor of expanding our Signature enum with differently formatted payload signatures. This way it would be far less work to support UCAN x CACAO interop as opposed to whole another representation. I would also be willing to implement support for this in @ipld/dag-ucan.
- Alternatively one might attempt to work with wallets to add support for JWTs, but I have no idea how viable this option is.
- Another alternative might be to wrap varsig, so the outmost varint would tell you how to format payload before signing. However if we have only two formatting options I'm not sure nesting is worth it and probably just adding few more variants to union would be easier. On the flip side adding those to multicodec table is probably going to be a harder sell.
[ ] CACAO only lists resources and not the actions you can perform with them.
- I would personally suggest simply encoding them as { with: uri, can: '*' } and call it a day. That way all of the UCAN tools would be able to know how to validate etc...
- Alternative could be to make can optional in UCANs, but I'm not in favor of that because bugs could lead to escalated capabilities as opposed to invalid UCANs.
- Perhaps there is some clever way to reconcile two, e.g. allowing string URIs in att which could in turn imply { with: uri, can: '*' }. Not a big fan of heterogeneous structures but personally amendable if this is what it takes.
- Alternatively we could borrow some ideas from recap, which if I recall correctly uses { [with]: { [can]: nb } } structure. I'm also amendable to this, but I expect pushback from community as it would be a huge breaking change.
[ ] statement field is unclear to me. @oed mind providing details on this ? Without knowing anything but a name of this I'm guessing it could be equivalent of fct in UCANs.
[ ] requestId is also unclear to me. Perhaps it could be shoved under fct ?
- It is also optional, so maybe we could add it to the IPLD data model and UCAN toolchain can simply ignore it ?
- I also have to say it bothers me to have requestId in IPLD world, it is something I would argue that is what UCAN CID should be, but maybe I'm misunderstanding what is it.
[ ] EthNFT is unclear to me. I do want to say here as well put it under fct just like I'm trying to put DKIM model in ucan-mailto.
- @oed it would help having some more context here so we weight tradeoffs.

oed commented 1 year ago

Ok, this turned out into a rather long response. I think it addresses most of the comments above.

I would expect that v field in the data model should be it, but maybe I'm missing something

Not sure if that would work version in SIWE represents the which version of the ABNF used to validate the string. I guess in theory we could have something like v: caip122 always resolve to SIWE version 1.

There is a nnc field which is an optional nonce

Ok, would work :+1:

There is an nbf field in UCANs, it's an int but on could simply translate that into date time and back no ?

Unfortunately converting back and forth between unix timestamps and ISO dates is not trivial. We initially wanted to do this but when @ukstv investigated it turned out to be next to impossible (timezones, leap seconds, etc makes it very difficult) (note that the conversion need to be deterministic and precise for it to work)

I have no idea what those are, could you please provide some context for us ?

These are strings, please see the SIWE spec

However unlike CACAO, in UCANs it's resource+action you can perform with it. More specifically resource will be in with field while action will be in can field. UCANs also allow open ended nb field to capture domain specific constraints when needed.

I'm not sure what makes sense for CACAO here. I think {with: uri, can: *} would be roughly equivalent of what resources are in CACAO. If this is not acceptable, perhaps we could discuss what would be a reasonable compromise that works for both CACAO and UCANs.

Imo, {with: uri, can: *} feels quite verbose here. Would prefer to keep it smaller. Also, It would be cool if we could consolidate ReCap with UCAN. ReCap is meant as a way to represent more expressive resource+action within a SIWE message.

I personally think that we're making much bigger deal of what we're signing that it's odd to be. If there is a canonical way by which we can derive payload from the data model we could not have to care about this too much. That said ideally we won't have too many of the variants and both JWT and IPLD representation should be able to support both.

Imo it would be great if we could deduce the canonicalization strategy from v.

As of 2.1 it did came up as well. I'd +1 that effort, that said I really don't have energy to drive did in multiformats effort as I fear it's a lot of work and constantly moving target. Perhaps in year or two it would be far more clear cut. Not to discourage anyone from going after this sooner however.

Opened #3 for this. Fyi, the DID Core spec is final now so should not be a moving target.

I feel like this does not really address differences. Specifically it is not clear how e.g. how one would validate UCAN with SIWx proof.

Well you need to verify it using the SIWE ABNF + eip191

It seems to me that similarily EthNFT could be embedded in the fct as opposed to having entirely different structure

This is a good point. I guess the main problem is that SIWE doesn't have the concept of a fct, however this is something that can easily be addressed using ReCap.

Alternatively one might attempt to work with wallets to add support for JWTs, but I have no idea how viable this option is.

This is not very viable at all. This is the reason SIWE exists in the first place. Add to that the complexity of doing such a thing over many different blockchain ecosystems. Too many actors to coordinate to be even close to feasible

Another alternative might be to wrap varsig, so the outmost varint would tell you how to format payload before signing. However if we have only two formatting options I'm not sure nesting is worth it and probably just adding few more variants to union would be easier. On the flip side adding those to multicodec table is probably going to be a harder sell.

Again, I think we can figure out canonicalization based on v.

Gozala commented 1 year ago

5. statement field is unclear to me. @oed mind providing details on this ? Without knowing anything but a name of this I'm guessing it could be equivalent of fct in UCANs.

From the link @oed posted https://docs.login.xyz/general-information/siwe-overview/eip-4361

statement (optional) is a human-readable ASCII assertion that the user will sign, and it must not contain '\n' (the byte 0x0a).

I have no strong feeling about this. I think it could go either inside facts or maybe we could add optional field in UCANs that would serve the same purpose. @expede do you have opinions on this ?

request-id (optional) is an system-specific identifier that may be used to uniquely refer to the sign-in request.

I would strongly suggest sticking it into fct when this field is needed. @oed does that sound reasonable ?

Imo, {with: uri, can: *} feels quite verbose here. Would prefer to keep it smaller.

Can you propose something that would work ? No matter what we do CACAO <-> UCAN would have to deal with the fact that UCANs have a can field and often times nb with some more details (which as far as I can tell does not exists in ReCap).

Also, It would be cool if we could consolidate ReCap with UCAN. ReCap is meant as a way to represent more expressive resource+action within a SIWE message.

Would be cool, but there is only so many balls one can joggle. I would prefer to focus on CACAO x UCAN interop increasing scope of this will only make it harder to achieve.

Imo it would be great if we could deduce the canonicalization strategy from v.

I'd rather do it with s itself, but I'm amendable to this. Problem with putting it elsewhere had been that we need to then carry extra context around not present on the data itself.

Opened https://github.com/ucan-wg/ucan-cacao/issues/3 for this. Fyi, the DID Core spec is final now so should not be a moving target.

👍

I was referring to all the DID methods out there, because it would be hard to define enum for all of them while it's constantly changing.

I feel like this does not really address differences. Specifically it is not clear how e.g. how one would validate UCAN with SIWx proof.

Well you need to verify it using the SIWE ABNF + eip191

I think we talk past each other here. What I mean is following:

Can there be a CACAO token that claims X capabilities (or resources in CACAO terms) from UCAN token ?
How would CACO token claim resource from { with: file:///some/resource', can: 'file/read' } ?
- Because UCAN there does not delegate all capabilities just file/read.
- In other words you need to somehow map capabilities to URIs for a CACAO interop and it should be deterministic bidirectional mapping.

This is a good point. I guess the main problem is that SIWE doesn't have the concept of a fct, however this is something that can easily be addressed using ReCap.

I'm actually not sure how does SIWx links to EthNFT as I don't seem to see any field in the schema or the https://docs.login.xyz/general-information/siwe-overview/eip-4361

I am under impression that you just link to the EthNFT URI in resources, is that accurate ?

This is somewhat different how, at least I, would express things in UCAN tokens. I would still have claimed specific capability in form of {with, can} tuple and provided and evidence using an IPLD link either

In the capabality itself { with, can, nb: { evidence: ethNFTLink } }
In the fcts.

In other words I do clam ability to perform an action on some resource, and provide evidence that I'm allowed to do it. I suspect this to emerge as a more general patter with various types of evidence supporting a claim.

Let me know if fct seems acceptable here and if so, I'll take it off the list of things to resolve.

Gozala commented 1 year ago

@oed I am under impression that you are trying to make SIWE be a data model. What I'm suggesting however to think of both JWT and SIWE as a format into which our IR could be serialized into and deserialized from. I feel that would be a more effective path forward with much less complexity than it is to try and fit variable data models into one. That way:

Library may not even deal with SIWE / JWT serialization at all.
Entirely independent libraries could be developed just to do JWT encode / decode or SIWE encode /decode.

oed commented 1 year ago

@oed I am under impression that you are trying to make SIWE be a data model. What I'm suggesting however to think of both JWT and SIWE as a format into which our IR could be serialized into and deserialized from. I feel that would be a more effective path forward with much less complexity than it is to try and fit variable data models into one. That way:

Library may not even deal with SIWE / JWT serialization at all.

Entirely independent libraries could be developed just to do JWT encode / decode or SIWE encode /decode.

Note that right now we are sort of treating UCAN as the data model in ucan-ipld. I think your statement above makes sense, but I see no reason why the IPLD data model should lean more in the direction of either SIWE or UCAN?

Can you propose something that would work ? No matter what we do CACAO <-> UCAN would have to deal with the fact that UCANs have a can field and often times nb with some more details (which as far as I can tell does not exists in ReCap).

How about something like this?

type UCAN struct {
...
  att {Resource:[Capability]}
...
}

type Capability struct {
  can Ability
  nb optional NonNormativeFields
} representation tuple

This would allow us to more easily map ReCap to this structure as well.

I'd rather do it with s itself, but I'm amendable to this. Problem with putting it elsewhere had been that we need to then carry extra context around not present on the data itself.

If we want to do it with s we need to add another varint to varsig. Right now there's nothing in the currently registered signature types that implies either JWS or SIWE canonicalization.

Can there be a CACAO token that claims X capabilities (or resources in CACAO terms) from UCAN token ?

How would CACO token claim resource from { with: file:///some/resource', can: 'file/read' } ?

So SIWE doesn't really say anything about what you use the resources for. The only way you could really represent something like 1 or 2 above is to use ReCap.

I also want to make clear that CACAO is not meant as anything else than an IPLD representation of some object capability. SIWE is the first use case for CACAO. Imo, if we align ucan-ipld with CACAO I would use these terms interchangably.

I'm actually not sure how does SIWx links to EthNFT as I don't seem to see any field in the schema or the https://docs.login.xyz/general-information/siwe-overview/eip-4361

the EthNFT use case is just something I came up with on the spot. I think your idea of using a fct makes sense :+1:

It seems like the biggest open question still here is how to deal with that the date formats are incompatible. I don't have a good answer for this.

expede commented 1 year ago

@oed I haven't had a chance to go through the points above in detail yet, but some stray thoughts from thinking about my PR that expand a bit on what @Gozala said in a few places above:

What is the purpose of CACAO? I think I had it wrong in my mind. It's actually a container format to help distinguish between types, not a compatibility layer. You still need to build standards for ReCap <> UCAN <> SIWx <> zcap-ld <> EthNFT <> etc. It may be more valuable for the UCAN project to focus on UCAN <> SIWx for example, and then let CACAO wrap those for CASA purposes. Does that make sense?

I have a pretty packed schedule the next few days, but will try to go through @Gozala's points above in more detail.

oed commented 1 year ago

@expede I think that's mostly right. I see CACAO to be equivalent to ucan-ipld if we get this right!

ReCap <> UCAN is a separate issue, but I would love to see that happen.

EthNFT (or EthDAO etc.) can be covered by fct as per discussion above.

expede commented 1 year ago

@oed okay awesome that we're aligned there / my mental picture has caught up :)

Trying to get all of these formats into a single body reminds me a bit of the XKCD comic:

If there's good reason to pass around UCANs and others as CACAOs, I wonder if it makes more sense to treat the CACAO wrapper as signalling and let the bodies vary. Otherwise we're going to end up with the lowest common denominator across all capability types.

As a quick first pass:

// Pseudocode

interface Common {
  // The absolute *minimal* required fields for interop
  iss: DID
  aud: DID
  exp: Timestamp
  // ...a handful of others...
}

enum Payload {
  siwx: Common & SIWx,
  ucan: Common & UCAN, // ucan-ipld as it exists today
  recap: Common & ReCap,
  zcap: Common & ZCap,
  // ...
}

interface Header {
  t CacaoKind
}

interface Cacao {
  h: Header
  p: Payload
  s: Signature
}

Which from re-reading the CACAO spec may extend nicely to, since the h captures that there's a different type in the payload.

Something that we learned recently in UCAN is that we can treat different serializations (EIP-191, SIWx, Passkeys, etc) as signature types, which is the only place where this actually matters (so that we can validate the signature). This means we "only" need to define a SIWx-to-UCAN spec, and we'd get CACAO "for free".

ReCap <> UCAN <> SIWx <> zcap-ld <> EthNFT <> etc

In light of this, is there any disadvantage to not wanting to represent UCAN as a CACAO, and only ever going the other direction (CACAO to UCAN)?

Unfortunately converting back and forth between unix timestamps and ISO dates is not trivial.

I wonder if I can simplify the approach taken so far in this repo — @oed you'll know better than me! What if CACAO was treated as a target SIWx -> UCAN only. In this world, we have no reason to CACAOify UCANs, and can lift CACAOs into UCAN pretty easily by e.g. dropping the iat field and converting the TZ-aware timestamp into TZ-agnostic Unix timestamps. Or another way: if this is always in the direction UCAN -> CACAO(UCAN), can we don't have to deal with round-tripping the TZ info.

Another challenge is that UCAN has strictly more information than a CACAO in its capabilities. Restricting to CACAO -> UCAN again makes this clean because we can cast CACAOs abilities to an implicit can: */*, but the other way (UCAN -> CACAO) will need extra knowledge in the tooling to be able to understand a query param or similar.

Thoughts? 🙏

oed commented 1 year ago

If there's good reason to pass around UCANs and others as CACAOs, I wonder if it makes more sense to treat the CACAO wrapper as signalling and let the bodies vary. Otherwise we're going to end up with the lowest common denominator across all capability types.

This is the original intention behind CACAO. Although, ucan-ipld has a lot of improvements over it, with regards to how DIDs and signatures are represented.

converting the TZ-aware timestamp into TZ-agnostic Unix

Lifting the TZ info to a separate property was something that @ukstv explored and it proved not really to be workable. His notes:

Leap seconds, which lead to discontinuity and same representation of different seconds. Minor, but weird. https://alexwlchan.net/2019/05/falsehoods-programmers-believe-about-unix-time/

Given the conversation we had during the CASA call and the new varsig spec you are working on (roughly <canonicalization-alg-code><signature-alg-code><length><signature>). I would suggest we can even change CACAO into something like this:

type Cacao struct {
  signature Bytes // varsig
  payload UCAN | SIWx | ...
  iss DID
  aud DID
} representation Tuple

However, if we can align ReCap + UCAN capabilities as per eth magicians thread, we could also include more params:

type Cacao struct {
  signature Bytes // varsig
  payload UCAN | SIWx | ...
  iss DID
  aud DID
  att nullable Capability
  prf nullable [&Cacao]
} representation Tuple

bumblefudge commented 1 year ago

One minor note on the timestamp format thing-- the VC-WG at W3C has been debating this back and forth for years, for example, in this thread and many others. It might be worth checking in with the VC-JWT "work item" (the subset of that WG trying to iterate more on the JWT sections of the spec, with a particular emphasis on how to deterministically on the roundtrip between essentially LD-flavored data model and JWT, including on this exact point about timestamps). if i cross paths with anyone active there this week i'll ask for more bibliography or secret tribal knowledge on the timestamp roundtrip issue

chunningham commented 1 year ago

In SIWx/CACAO we use rfc3999 timestamps to maximise the human readability while still maintaining machine readability. IMO it is enough that both formats (UCAN and SIWx) use timestamps which can be compared (e.g. by conversion to unix time, assuming any edge cases are papered-over by a reasonable allowance for clock skew upon verification), even if they are not of the same format in the token literals.

Most relevant I think is compatibility between the ReCap and UCAN att objects. As long as we can transform between ReCap and UCAN att objects then it should be ok, the caveat being that currently one ReCap object must be decomposed into several UCAN att objects if it has more than one ability and defaults are probably not advisable. @expede's observation about choosing one format as the "canonical" data model seems wise to me, whereby as long as all formats can provide certain properties then they can be used together within the same delegation chain.

The biggest open question for me is where to fit the prf links in a CACAO, I would vote for inserting them into the ReCap capability objects. Particularly for us it's important that the message signed during the SIWE UX contains all the information, which requires prf links to appear literally.

IMO it would be best to define a CACAO type specific to ReCap which renders the att objects in IPLD for canonicity and convenience and working from that, as SIWx by itself doesn't allow for that level of structured information. I am in favour of changing aspects of the ReCap format to enable this kind of behavioural interop, however changing SIWE/eip4361 will be harder as it is gaining adoption and breaking changes at this stage would be tough.

expede commented 1 year ago

I would vote for inserting them into the ReCap capability objects

Something like this I guess?

{
  "example.resource.1": {
    "crud/read": {
      "prf": "QmABCDEF"
    },
    "crud/update": {
      "prf": "Qm12345"
    }
  }
}

oed commented 1 year ago

Wait, why is there a need to include prf in the att?

Imo, in ReCap we can just put prf top level:

{
  "tar": {
    "example.resource.1": {
      "crud/read": {}
    }
  },
  "prf": ["QmABCDEF"]
}

expede commented 1 year ago

@oed

The biggest open question for me is where to fit the prf links in a CACAO, I would vote for inserting them into the ReCap capability objects

That was just me sketching out the suggestion.

This is an idea for layout that has come up UCAN before. The main upside is that it's really clear which proof is intended for what. The downsides include:

You still have to loop through a list of proofs, just fewer of them at once
You repeat the token prf a lot
It's annoying if you have a revoked proof in one capability, but a proof that would have covered it in another one, but it's not in the right capability scope despite being in the UCAN

We chose to keep prf as a separate field. It comes up from time-to-time, though.

oed commented 1 year ago

Exactly, I was just saying that it makes sense to put prf as a separate field in ReCap as well.

expede commented 1 year ago

Yup, we're on the same page! Just explaining the reasoning for the thread :)

expede commented 1 year ago

Going back to the top

statement optional String // =statement

This could be held in the fct field

requestId optional String // =request-id

I think that this could also go into the facts.

Semantic question: what's the difference between a requestId and nonce?

but if there is a reason this won't work please tell us why ?

@Gozala you'd lose the timezone information, so it's lossy. UCAN also doesn't have centiseconds 1985-04-12T23:20:50.52Z

We can map UCAN -> SIWE timestamps trivially (treat them as GMT). We can go strictly the other way as well (SIWE -> UCAN). We can't round trip SIWE -> UCAN -> SIWE.

Some options:

Only flow SIWE -> UCAN, never the other way around
- Also solves for the lack of abilities, since we can infer "*"
Put the missing info in to the fcts field, and make the translation protocol aware of this
Treat SIWE as a signature type, and include the missing info in the prepended bytes
- <multibase><nbf-tz><nbf-cs><exp-tz><exp-cs><sig-bytes>

oed commented 1 year ago

Some options:

How would we verify the signature if we can't go back to SIWE?
This is fine
Why not just put TZ info in fct as well?

chunningham commented 1 year ago

after closer reading of eip5573 I see that it includes an parentCapability: 'ba...' field in ext which I didn't recognise earlier 🤦. In that case I think a process can be defined to transform a set of recap objects into the prf and att fields of a ucan, including merging, along the lines of a set of Recap objects:

[{
    tar:  {
        resource.1: ["read"],
        resource.2: ["write"]
    },
    ext: { parentCaps: ["Qm1", "Qm2"] }
},
{
    tar: {
        resource.2: ["read"],
        resource.3: ["delete"],
    },
   ext: { parentCaps: ["Qm3"] }
}]

becomes:

prf: ["Qm1", "Qm2", "Qm3"],
att: [{
    with: "resource.1",
    can: "read"
}, {
    with: "resource.2",
    can: "read"
}, {
    with: "resource.2",
    can: "write"
}, {
    with: "resource.3",
    can: "delete"
}]

This is sort of assuming that there will not be duplicate keys within a single recap object, and there are probably edge cases from merging other fields from recap ext into the ucan nb. Is it enough to be able to generate or extract the semantic content (probably by extracting or translating into a ucan body) when verifying, without requiring actually serialising into that format?

oed commented 1 year ago

see that it includes an parentCapability: 'ba...' field in ext which I didn't recognise earlier 🤦

@chunningham this is just an example, not a recommendation afaik. I'd really prefer if we can converge on the same representation in UCAN att and ReCap tar. Would really make compatibility easier.

I'm also advocating for the introduction of prf in ReCap.

expede commented 1 year ago

How would we verify the signature if we can't go back to SIWE?

We should always be able to reconstruct the SIWE from the signature.

Why not just put TZ info in fct as well?

We can. I should flag my bias that I have an intuition that using fct for data that's semantically meaningful to the rest of the credential seems like a bad precedent. I could be wrong.

In the spec, we say that fct should only be used for self-evident data, or info that isn't meaningful to the other fields. Nothing actually stops you from doing this, though. Maybe it's a silly restriction 🤷‍♀️ It is a code smell, and I wonder if we could solve this more cleanly somehow. But at the end of the day: something that works > something that doesn't.

<multibase><nbf-tz><nbf-cs><exp-tz><exp-cs><sig-bytes>

This is the idea that we riffed about on the CASA call: a multiformat for serialization + signature. I agree that it's not general enough here, though, because it's specifically the data that's not in a JWT. It's more general than UCAN/SIWE, but also we probably don't want to shove random data in the signature for converting between specific formats.

An example of a place where we've been playing with this pattern: in our early Passkey explorations, we can get a deterministic FIDO, but it may have a counter. You can't validate the signature without that counter. If we interpret it as part of the signature (just like the magic bytes in EIP-191), this signature is now portable.

oed commented 1 year ago

In the spec, we say that fct should only be used for self-evident data

Why is statement and requestId more self evident than TZ info?

This is the idea that we riffed about on the CASA call: a multiformat for serialization + signature.

To me your example of encoding extra data as part of the signature is very different! I thought where we landed was something like this: <varint-canonicalization-alg><varint-signature-alg><signature-bytes>

To me adding data to the varsig seems off. The only real data it should encode is the bytes of the signature itself and which algos to use to verify the signature.

Gozala commented 1 year ago

To me adding data to the varsig seems off. The only real data it should encode is the bytes of the signature itself and which algos to use to verify the signature.

argument could be made that alg includes canonicalization so instead of one variant tag we could have one, but it is also true that having multiple codes for same signing alg would be kind of odd.

for what it’s worth we’ve also allocated one umbrella code for all sigs so we could iterate on this under it.

Gozala commented 1 year ago

Ok so here’s one thing to consider, I would like to be able to format all UCANs as JWTs not just some. And unless I’m overlooking something that would require folding canonicalization into signature algorithm, because that’s only way I think JWT will allow us to do it.

Alternatively we could define varsig + canonicalization as some algorithmic for JWTs.

I’m not all that familiar with JWT side of things and would defer to @expede to guide us to what would be most adequate approach.

That said we could still map to <varint-canonicalization-alg><varint-signature-alg><signature-bytes> in IPLD model, but lets consider how do we map that to JWTs please

oed commented 1 year ago

argument could be made that alg includes canonicalization so instead of one variant tag we could have one, but it is also true that having multiple codes for same signing alg would be kind of odd.

I'm afraid we could get a silly amount of codes to register with this approach.

expede commented 1 year ago

get a silly amount of codes to register with this approach.

I don't disagree. It is worth noting that a lot of other communities have standardized things like RS256 rather than the equivalent object {alg: "RSA", hash: "SHA256"}

Probably worth saying out loud: part of the reason why we're (potentially) specifying a new form here is because (AFAIK) there's no equivalent like JWK/JWS that includes canonicalization/payload encoding information, since that's usually at a different layer. @oed it's probably worth asking: you have more experience with these forms that I do: is there such a field, or is it usually left for a different layer?

expede commented 1 year ago

also: https://github.com/ChainAgnostic/varsig/

Repo name subject to change, but this is a neutral place for UCAN, CACAO, ReCap and others to explore the idea :)

oed commented 1 year ago

@oed it's probably worth asking: you have more experience with these forms that I do: is there such a field, or is it usually left for a different layer?

I can't really say I have the most complete view here, but some observations. ld-proofs use a proof property where all information about the signature (or other proof) is stored. I assume COSE is quite similar to JOSE. For blockchain transactions the alg is sometimes implied e.g. ethereum tx.

Gozala commented 1 year ago

I think I have shifted my position in favor of <varint-signature-alg><varint-encoding><sig-size><signature-bytes>, because I've encountered yet another instance of what if payload is encoded as B as opposed to A. Put it differently, given that in IPLD codecs are often interchangeable it's not unusual to encounter data encoded either in DAG-CBOR or DAG-JSON. When data is signed it is no longer obvious in what formatting it was signed, most likely in the same format as the envelope, but e.g with ucan-ipld that is not the case. Also often times encoding info of the data is often no longer present in the layer that operates on that data, so knowledge about how payload was encoded before signature is also unavailable and needs to be carried somehow.

I believe capturing data encoding in signature is a great way to address all the above

Moving this into https://github.com/ChainAgnostic/varsig/issues/3