input-output-hk / prism-did-method-spec

Apache License 2.0
15 stars 1 forks source link

[Feature Request] Configurable SECURE_DEPTH #44

Closed iFergal closed 1 year ago

iFergal commented 1 year ago

Right now SECURE_DEPTH is set at 112 blocks for safety in case of a fork - so every create or update will take something like 35-45 minutes (at present based on any time I check an explorer).

This is definitely on the safer side and likely protection against big actors but for the average user it might not be acceptable - or for an IoT device exposed to a potential vulnerability it might be considered too long to rotate away from vulnerable keys.

A possible approach to this problem is to set a lower SECURE_DEPTH but allow the end user to optionally specify a SECURE_DEPTH override for their given DID update in case they want something safer such as 112.

EzequielPostan commented 1 year ago

this is an interesting topic, thanks for bringing it up :)

To some extent, long form DIDs actually were motivated as a mitigation for these latency constraints, allowing users to use a DID without any delay. It is harder to find a nice set of trade-offs for mitigating Update operation's latency.

Let's expand on the idea of reducing the SECURE_DEPTH. I will take it to an extreme limit to evaluate the impact, I understand you are not suggesting to make it "too small", but the latency wouldn't improve more than this hypothetical extreme case. Today, we have a block creation every 20 seconds (expected average), meaning that changing the parameter cannot move us below that 20s limit. It is the case the the network tends to frequently have forks on the last 1 or 2 blocks. So, let's say we take 3 for SECURE_DEPTH as a practical extreme low. This would lead to an average of 1 minute delay to confirm an update. Now, that doesn't take into account network congestion, i.e. if there are too many transactions competing for a place in the next block, a transaction holding an update operation may have an additional delay. The above (assuming it is correct), implies that the delay for an update confirmation would take (at minimum) 60 seconds on average. There may be happier cases where multiple blocks occur in a shorter period of time, but applications would not be able to rely on such assumptions.

A priori, some use cases may still find multi-second delays as too long. Leading to a question of, how many use cases require different latency requirements? That may help us to reduce the SECURE_DEPTH while maintaining the protocol on a safer side.

On a separate point, letting users to increase the desired level of confirmations looks interesting, we should take note of this idea as a tool. This does not mean we will implement it, but it may come handy to evaluate other problems in the future.

Now, going a bit further, each use case may have different ways to mitigate latency. Some may just implement protocols on top of the DID method, e.g. replace DIDs instead of updating them; other may share the updates off-chain (in a custom way). There are other options one could consider, some DID methods compromise decentralization to mitigate this.

The latency trade-off is one we like to explore, and we have some high level thoughts. The input from use cases will definitively help shaping the direction and trade-offs to consider

iFergal commented 1 year ago

The extreme low example is for sure an interesting way to look at things!

I do think key rotation is important (instead of migrating to new DIDs), particularly in IoT:

Sharing updates off-chain would work better but that also makes me wonder about the importance of waiting at all. Forks in crypto in general are a big issue with double spending - pay for a service, get the service off-chain/in real life, then fork and you have your tokens to spend again.

But here it is a DID controller updating the DID with a signature, and I'm not sure what they gain from signing verifiable presentations with different keys, key A and key B (one key for each fork). The main thing to protect against is being able to rotate away from key A if its compromised asap. But I really need to think about this point more, this is just a thought that just entered my mind and I could be missing something obvious there. :')

A kind of compromised approach for a resolver might be (when given a special parameter):

This renders the IoT device useless during the update phase but it could be a lot better than a stolen key from a device going rogue for that length of time in an industrial environment. Essentially allowing the verifier to make the decision themselves.

EzequielPostan commented 1 year ago

apologies for the delay

interesting response! Thanks for sharing :)

But I really need to think about this point more, this is just a thought that just entered my mind and I could be missing something obvious there. :')

writing is a great way to think out loud, so let me try to do the same collaborate in the process

I'm not sure what they gain from signing verifiable presentations with different key

I think this may be important for situations where auditability is desired. For those cases, it may be more problematic to allow rollbacks than forks per se. If events on real-life are triggered by events that are posted on-chain, there may be cases where the chain could roll back, but the real life events not. It is the case that one could reduce the rollback guarantees of the DID method, the question would be: what is obtained in exchange? Reducing latency is what we can observe, but it may still end up too big for the use case that motivated the reduction.

Other options that may be similar to your approach that we have entertained in the past are:

I am interested in the requirements from your use case. How much latency is the max the system could tolerate? If the need is sub-second, it would be interesting to think of different patterns outside of reducing SECURE_DEPTH and evaluating the trade-offs for those patterns too

I hope I could add some value to the topic, thank you for collaborating and apologies again for my delay

iFergal commented 1 year ago

apologies for my even longer delay!

Thanks for your reply. :) Yes auditability makes sense; though I guess this means for those that require strict auditability and lower latency they run into issues as sharing updates off-chain will have the same consequences.

It is the case that one could reduce the rollback guarantees of the DID method, the question would be: what is obtained in exchange? Reducing latency is what we can observe, but it may still end up too big for the use case that motivated the reduction.

Reducing latency is all that I'm concerned with, but specifically from a safety perspective in the scenario where authentication (or similar) keys are knowingly stolen, and a user wants to use their (still safe) MASTER_KEY to rotate away from that stolen key and reduce as much damage as possible - 112 blocks is a long time for keys to be stolen and "doing damage" especially in the IIoT use case again.

Ignoring any presentations from devices that are "updating" would likely be acceptable in this scenario considering a key has been stolen; but the question is how do other devices know a key has been stolen? What if this is just a routine key rotation for best practices?

I am interested in the requirements from your use case. How much latency is the max the system could tolerate?

I don't have specific use cases for myself per se but rather I'm trying to see if we can support as much as possible - a 112 block time feels like it could potentially drive away some users, perhaps even just from a business perspective. That is why I think allowing the user to specify an increased depth parameter themselves allows them to have more control over what they consider safe.

The idea of sharing the unstable part of the chain in the resolution metadata is interesting but as you say might lack interoperability. And also may run into an issue with auditability.

EzequielPostan commented 1 year ago

my apologies for the long delay

auditability makes sense; though I guess this means for those that require strict auditability and lower latency they run into issues as sharing updates off-chain will have the same consequences.

indeed, the combination of requirements for strict auditability and low latency may be problematic in nature. So far, auditability use cases tend to be okay with low latency, but this is not a rule we can count on for all use cases

Ignoring any presentations from devices that are "updating" would likely be acceptable in this scenario considering a key has been stolen; but the question is how do other devices know a key has been stolen? What if this is just a routine key rotation for best practices?

This may be coordinated at application side. If I understand the setting, if there is an incoming update, presentations may be required to already use the incoming key. In that way a presentation with old key could flag a key compromise, while presentations with incoming key could flag a simple rotation

The idea of sharing the unstable part of the chain in the resolution metadata is interesting but as you say might lack interoperability. And also may run into an issue with auditability.

On auditability side, it would still be a decision of the consumer of the DID method to act on unstable data. The latency issue would remain for them. However, the approach of allowing the resolver the "resolution depth" could satisfy the cases that want auditability without constraining so much use cases that are fine with unstable state. It must be clear for users the meaning and guarantees of different depths. This depth parameter is not too distant from historical queries #43

I would note however that these two alternatives (allowing depth selection during resolution, or sharing unstable parts of the chain in resolution metadata) are still above the multisecond latency

The reality of DID methods is that, although they share a common resolution interface, they differ in what they offer. Some don't allow updating a DID (like did:peer), others don't allow transferring the ownership of a DID (e.g. Sidetree based methods), others are more centralized. Designing a new method is a game of picking trade offs. We will keep working on ways to reduce latency while balancing other features. Maybe a resolution parameter as you propose could work, we will try to refine the idea. The detail that could pop to mind is if this parameter can be easily passed through the universal resolver, but we may be able to re-use something closer enough defined in the standard.

Do you think we should keep this issue open @iFergal? Or should I close it for now? We can always re-open it if we come up with more points on this topic

Thank you for the input, and my apologies again for the delay

EzequielPostan commented 1 year ago

I am closing the issue to keep a clean view of active ones Feel free to re-open if needed