ssbc / ssb2-discussion-forum

not quite tiny, also not quite large
17 stars 1 forks source link

PPPPP Tangle Auth #24

Open staltz opened 1 year ago

staltz commented 1 year ago

:bomb: :fire: Problem: security

People are raising concerns that sharing your keypair to many devices is not good for security.

:bomb: Problem: unclear sign-in method

It's unclear when should you share the keypair, and when should you use something like broker auth.

When you're "logging in" with a new app, suppose a chess game, you probably don't trust the app developer that much to give your keypair to the app. You should probably use broker auth. But then, what kinds of apps can get the keypair and what kinds of apps should use broker auth? That line is hard to draw, and I bet many apps will end up just asking for your keypair, because it's simpler.

Now instead of spreading your keypair on "Manyverse" on many devices, you are sending it to all sorts of apps (closed or open source). Pretty bad for security.

:sun_behind_small_cloud: All tangles are multiauthor

One important realization I had is that the tangle data structure is by definition multiauthor. If it's single-author (and single-device), then it would end up being a linear sequence, so no need to have complex DAG algorithms. It only becomes a real DAG in the presence of concurrent (and delay-tolerant) authors.

Your "feed" (one kind of tangle) is authored by many devices.

A thread (another kind of tangle) is authored by many persons.

And so forth.

:bulb: Idea: write authorization for tangles

What if we add the restriction that keypairs are never shared? (Similar to #16) Then we solve both problems aforementioned, with an important tweak:

What if each tangle would declare which authors can write to it? Currently, a "feed tangle" can only be authored by the feed's keypair. But we could change this such that any authorized peer could write to your feed tangle! This is just a matter of tweaking the validation code so that it treats msgs from those authorized peers as valid.

---
title: Currently, on a feed tangle
---
graph RL

R["(Feed root)"]
A[Alice published]
B[Alice published]
C[Alice published]
D["(invalid)"<br />Bob published]:::red

C-->B-->A-->R
D-->A

classDef default fill:#bbb,stroke:#fff0,color:#000
classDef red fill:#f88,stroke:#fff0,color:#000
---
title: Proposed change, on a feed tangle
---
graph RL

R["(Feed root)"]
X["Alice published<br />'Bob has write access'"]
A[Alice published]
B[Alice published]
C[Alice published]
D["(valid)"<br />Bob published]:::green

C-->B-->A-->X-->R
D-->A

classDef default fill:#bbb,stroke:#fff0,color:#000
classDef green fill:#8f8,stroke:#fff0,color:#000

In practice, this could be done by publishing a special message on that tangle that declares public keys that now have "write access" to this tangle. This should work for all kinds of tangles!

This would mean that if you want to login to your account on another device, you can just do auth like this:

sequenceDiagram
  participant A as My Laptop
  participant B as My Phone

  B->>A: shs connect
  B->>A: can I have write access on all your feeds?
  note over A: on each feed, publish a message<br />that declares "My Phone"'s pubkey<br />as having write access
  A-->>B: yes
  note over B: I can now publish on your feeds<br />and other peers will read your feed<br />as being one person

:bomb: New problem: network identity

By allowing other keypairs to publish messages on your feed, we solve the feed problem, but the other area where keypairs are used is in Secret Handshake, and to identify you when establishing connections.

Now we have a problem, because even I follow you as @abc123, you may be using a different device with keypair @xyz456 which I won't recognize as friendly.

Proposed naive solution

The two contexts where connectivity need to be treated are:

In rooms, it's possible to give the room a token issued by @abc123 that proves that @xyz456 is the "same person". Then, when the room tells its members that @xyz456 is online, it can also annotate that "oh yeah by the way, this is the same person as @abc123 and here is proof".

In LAN, remember there are UDP packets being broadcast to inform your multiserver address. This is a good place to include a proof that @xyz456 is the same person as @abc123.

Maybe we could generalize these two cases so they aren't treated separately. Maybe this could be done on the SHS layer of the stack. Maybe there's a better way too.

:bomb: New problem: erasing

If the authorization data is in the content, we can't erase that msg. ("Erase" means to delete msg.content but keep msg.metadata and msg.sig).

Proposed naive solution

We would have to change the metadata to include this authorization data.

:bomb: New problem: omitting authorization

PPPPP sliced replication is such that for msgs at depth 100–200, I can send those plus the "certificate pool" between depths 0–100. The certificate pool is hopefully as small as possible, and just gives us the shortest path from 100 to 0.

However, any authorization msg is important and should not be dropped from replication. There may be several authorization msgs between depths 0–100, and we need to make sure they are replicated. These authorization msgs are not (by design) going to be in the shortest path from 100 to 0, and we need to make sure that all authorization msgs have a path to the root. This may suddenly make the "extended" certificate pool much larger, and hurt storage and bandwidth overhead.

:bomb: New problem: revoking

Revoking write access is an entirely different and complex problem. We could postpone solving it, and just say that once you give write access to someone, you can't revoke it. Revoking seems doable (inspired by our work on excluding members from private groups) and doesn't need to be solved from day one.

:bomb: New problem: feedID and feed root msg

With this new design, we don't really need the "feed ID" as being the public key of a keypair. It can be any random bytes, to identify you. You just need to include the "first authorized peer" in that feed. This again has #16 vibes.

The design of the feed format could change now, we could treat all tangles literally the same way, with no special treatment for the "feed tangle" versus a "thread tangle" or other kinds of tangles. This is open, I need to sketch what this would look like in code.

:bomb: New problem: main keypairs vs other keypairs

In the tangle auth system described, the main pubkey (from the keypair that defines the feed ID) is different to all other "authorized" pubkeys. To discover a subfeed, you just need the main's pubkey and the "message type" string, then you can deterministically determine the feed root msg.

What this means is that you could authorize one of these "other" keypairs for write access in one feed, but they would never have the access to start new subfeeds, because to start a new subfeed you have to sign the (deterministically predictable) root msg with the main keypair.

:studio_microphone: Feedback

Thoughts about this? @arj03 @ahdinosaur @gpicron

arj03 commented 1 year ago

Some initial thoughts:

Yeah this is a hard problem. One that I never found a good solution for. Good defined as something that is orthogonal to other constraints and thus can be solved once and you can build on top of that. I think this mostly comes from the fact that you are working on distributed state and that has a bunch of edge cases.

One thought that came to mind is that in a tangle, once someone extends another keys tip, then that in a way confirms that that messages was fine in this context. Maybe it is possible to use that information somehow to build a simpler model?

Related work:

staltz commented 1 year ago

It is indeed a hard problem. I've been sitting here looking at the ceiling trying to think WHY it's hard.

I came down to a trilemma:

Choose 2. :smiling_face_with_tear:

staltz commented 1 year ago

Actually, there is one way to have all three, but it's not pragmatic: your main keypair (whose pubkey identifies you) would live in your brain alone, and you would sign messages by hand by running the elliptic curve operations in your brain. Then we can have all 3 properties. :upside_down_face:

staltz commented 1 year ago

I would love for the trilemma to be false, and for there to be some magical way out of it.

Assuming there isn't, we have to make the tough call of dropping one of those properties. Let's take a deep dive into each of those worlds, and try to play out the far future in that world.

Identity Limbo

i.e. drop "I am known everywhere by the same identifier".

In this world, you would have a proliferation of keypairs, and public keys would be nearly useless as identifiers. We would have to come up with some other system of identifying people. Maybe piggybacking on DNS?

We need to figure out who you are, either when validating msgs that come from "you", or when trying to connect via SHS.

But we would have good security, and all devices would be treated equally and apps.

Account recovery would be weird. Say you had only one device/app and it exploded, then you buy a new device, install the app, and there's no backup recovery phrase to insert, since the new device/app would start its own keypair, and because you don't have the old device anymore, it would be impossible to "link" together the new and the old. You would have to re-onboard to the network.

Multi-device SSB is almost like this already, and we have just used manual linking in the bio to link identities together. This world could also work for PPPPP, and it would tie well with the principle of identities being relative to adjacent peers in the social web, as opposed to identities being globally unique/known (Twitter) or self-determined (via cryptographic keys). Identity would be tied to your surrounding community, and that could be a good thing.

On the other hand, in this world, feed tangles wouldn't make much sense and we could just go back to append-only feeds.

Scatter my keypair

i.e. drop "Keys are not shared"

This would compromise security, a lot. But for the sake of argumentation, let's pretend that wouldnt be a problem.

PPPPP is already designed for this in mind, and tangles would work fine, and the story around account login and recovery would be simple. The main problem would be handling SHS in a scenario where two devices connect to each other although they use the same keypair. Fun fact: SHS already allows you to connect like that! So the other challenges would be discoverability and disambiguation in rooms and LANs.

It could work, but it's hard to imagine it could survive the test of time.

Main device

i.e. drop "All devices and apps are equal".

Okay, we get a stable ID everywhere, and we're not sharing keys.

But we're introducing a clear hierarchy, and one of your device/apps would be the king. That main device would be a single point of failure, and you would have to treat it more carefully. Comes to mind is how Signal does identity too, and there are downsides there. There's also the downside of losing some decentralization to this main device.

That said, let's think about it more positively. We don't need decentralization of your identity. We can have decentralization in other aspects of the network, such as server topology (and ephemerality). Having a stable ID is really nice. Oh, we wouldn't have the network identity problem of the same keypair being used by two different peers. We would still have to "prove" that we are the same person as the main ID, but that's doable.

Another positive thing is that centralizing your main ID in one device makes for a simple model for users to understand. It's also possible to move the main ID to another device, if you need to. Account recovery is simple, and you can do it on any device.

Heck, you could even have the main ID on two different devices simultaneously, without fork dangers. Of course, you shouldn't do that, and I think it'll be hard to tell people why they shouldn't, but at least if something goes wrong it's your own fault. I just hope 3rd party apps don't start to ask your main ID recovery phrase.

Conclusion

I'm trying to make my mind between Identity Limbo and Main Device, with a slight preference for Main Device, because identity is such a murky and weird idea in Identity Limbo. I feel like that holds some of its own monsters and complexity that we're not aware of, while Main Device is a simple idea without hidden monsters, and we're aware of the cost it takes on decentralization principles. In other words, I could be fine with Main Device.

ahdinosaur commented 1 year ago

yes please tangle permissions, i want this to be possible, this is very exciting. :pray:

maybe some silly questions:

Account recovery would be weird. Say you had only one device/app and it exploded, then you buy a new device, install the app, and there's no backup recovery phrase to insert, since the new device/app would start its own keypair, and because you don't have the old device anymore, it would be impossible to "link" together the new and the old. You would have to re-onboard to the network.

i do believe Keybase uses something like this, because they are very clear that if you lose access to all your added devices, you lose access to your account. so they recommend adding multiple devices.

staltz commented 1 year ago

Yay, more feedback!

yes please tangle permissions, i want this to be possible, this is very exciting. pray

I agree it's exciting and opens up a lot of experimentation.

wouldn't the tangle identifier be the hash of the first message published in the tangle?

This is already true in feed-v1 today. But:

why can't this be the identity you are known by? (and could be given an alias with DNS, etc)

Because you have many feed tangles. You have a feed tangle for post, another one for profile (a.k.a. about), another one for reaction (a.k.a. vote), and each of those have different "first message published in the tangle" a.k.a. root msgs.

then wouldn't devices need to still identify in SHS and such as their keypair?

Yes, and during (room and LAN) discovery, peers would have to prove that they are equivalent to their main keypair, by showing some signature signed by the main keypair.

Keybase

Interesting, I'll take a look at what they do, and think about this

staltz commented 1 year ago

Note to self: read this and see if there's something to be inspired with https://github.com/dxos/dxos/blob/main/docs/docs/design/halo-spec.md

ahdinosaur commented 1 year ago

why can't this be the identity you are known by? (and could be given an alias with DNS, etc)

Because you have many feed tangles. You have a feed tangle for post, another one for profile (a.k.a. about), another one for reaction (a.k.a. vote), and each of those have different "first message published in the tangle" a.k.a. root msgs.

oh i see! then what if we had a separate "identity" tangle, and every other feed tangle must refer to the latest tip of the "identity" tangle they are participating as.

then wouldn't devices need to still identify in SHS and such as their keypair?

Yes, and during (room and LAN) discovery, peers would have to prove that they are equivalent to their main keypair, by showing some signature signed by the main keypair.

but in a tangle world without main keypairs, wouldn't the identity tangle have a message saying "we added this keypair to the tangle", and then peers just need to present one of the keypairs added to the tangle?

ahdinosaur commented 1 year ago

i'll try to wander with some messages...

(click to expand)

create a new identity feed ```js { "content": { "init": { "devices": { "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW": 1 }, } }, "metadata": { "hash": "QwrP7DAMHhHe71Qf87tXBf", "size": 71 "tangles": {}, "type": "identity", "v": 1, "device": "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW" }, "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr" } ```
initialize post feed ```js { "content": null, "metadata": { "hash": null, "size": 0, "tangles": { // identity feed "QwrP7DAMHhHe71Qf87tXBf": { "depth": 1, "prev": [ "QwrP7DAMHhHe71Qf87tXBf" ] } }, "type": "post", "v": 1, "device": "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW" }, "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr" } ```
post as the new identity ```js { "content": { "text": "hello world!", }, "metadata": { "hash": "Cz1jtXr2oBrhk8czWiz6kH", "size": 23, "tangles": { // identity feed "QwrP7DAMHhHe71Qf87tXBf": { "depth": 1, "prev": [ "QwrP7DAMHhHe71Qf87tXBf" ] }, // post feed "SRGfAAnxTzjN6mEDJ542hf": { "depth": 1, "prev": [ "SRGfAAnxTzjN6mEDJ542hf" ] } }, "type": "post", "v": 1, "device": "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW" }, "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr" } ```
add a device to an identity ```js { "content": { "add": { "devices": { "8kBhDXpZajdBRFLq8zophqCzbFsFzvuwBGoWj7TU9Loe": 1 }, } }, "metadata": { "hash": "72MK9ETRNGKm7Jh8ryJswM", "size": 70, "tangles": { // identity feed "QwrP7DAMHhHe71Qf87tXBf": { "depth": 1, "prev": [ "QwrP7DAMHhHe71Qf87tXBf" ] }, }, "type": "identity", "v": 1, "device": "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW" }, "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr" } ```
post a message from the new device ```js { "content": { "text": "yo i heard you like tangles!", }, "metadata": { "hash": "K8tzhL8Sewr3mVr1dpaYa2", "size": 206, "tangles": { // identity feed "QwrP7DAMHhHe71Qf87tXBf": { "depth": 2, "prev": [ "72MK9ETRNGKm7Jh8ryJswM" ] }, // post feed "SRGfAAnxTzjN6mEDJ542hf": { "depth": 2, "prev": [ "NWmZDGa64kiY2cDaM3u8c2" ] } }, "type": "post", "v": 1, "device": "8kBhDXpZajdBRFLq8zophqCzbFsFzvuwBGoWj7TU9Loe" }, "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr" } ```
ahdinosaur commented 1 year ago

One thought that came to mind is that in a tangle, once someone extends another keys tip, then that in a way confirms that that messages was fine in this context. Maybe it is possible to use that information somehow to build a simpler model?

i notice that when you refer to a identity tangle, you are referring to the state of the identity that you accept. if there's a new identity message adding a new device, when you make a new post message referring to that identity tip you are accepting the change in the permission state.

arj03 commented 1 year ago

The tricky part is removal. Lets say you have 2 devices and one gets compromised, which device wins and is allowed to continue using the id? If you want a stable id, then you have to have some way for a tie-breaker. Either a complicated protocol or as staltz said using a main device. This could also be a master key that you keep offline for rare situations like this. You can't really do that for groups which is why we needed https://github.com/ssbc/ssb-group-exclusion-spec.

ahdinosaur commented 1 year ago

Lets say you have 2 devices and one gets compromised

in PGP / GPG when you generate a key you also generate a certificate to verifiably revoke that key. :key:

for cases where your key is compromised, could you publish an identity/revoke message (from the compromised key) and point to the last "good" identity message from that key? would subvert if someone tried to change identity permissions after gaining access to your key. but probably has more edge cases than i can imagine.

staltz commented 1 year ago

(I'm back! Had to focus on life and work for a while)

Identity tangle

@ahdinosaur Yes, this is a good idea! It also crossed my mind that the identity tangle could be a ppppp-set, so it could have identities added and removed, and msgs pruned over time.

But I'm not sure how we're going to use identity tangles in practice. Does the identity tangle root msg become your "feed ID"? Okay, let's run with that idea for a moment.

(:bomb: Part of me is trying to avoid "looking up feed IDs" due to my negative experience with ssb-meta-feeds, so the identity feed isn't immediately exciting to me, but it could have some purpose if designed well. Let me set aside that feeling for now.)

---
title: Alice's identity tangle
---
graph RL

R[desktop pubkey adds<br />desktop pubkey]
A[desktop pubkey adds<br />phone pubkey]
Rid[Hash of the tangle<br />root is Alice's ID]:::weak

A--->R
Rid-.->R

classDef default fill:#bbb,stroke:#fff0,color:#000
classDef weak fill:#fff0,stroke:#fff0,color:#000

Important note at this point: these msgs would have pubkeys in the who field. Like usual.

Now, suppose the identity root msg hash is MTYQM89hvHuiVKaw8Ze7kc. This is "Alice's ID". How does this become useful?

Maybe Alice could start post feeds or things like that, where who is MTYQM89hvHuiVKaw8Ze7kc and type is post. This is a special case, though, because formerly the who used to always be a pubkey, now it's a msg hash.

(:bomb: I don't like the heterogenous type for who, but let's handwave that out of the way for now. )

When Alice published a post msg on her post feed, it'll be signed and stuff, but peers who get that msg won't be able to immediately validate it. They will have to first fetch Alice's identity tangle MTYQM89hvHuiVKaw8Ze7kc, collect all the pubkeys there, and then try each pubkey on the sig. If it's validated, then great.

Let's test this design against the problems mentioned in the original comment in this issue.

Tangle auth

Well, we don't have tangle auth as originally described, because the post feed doesn't need to have special "write access" messages. It's all externalized in the identity tangle.

In that sense, the identity tangle can be used for purposes other than defining a group of devices. It can define a group of people AND their devices. So we can get tangle "write access" verification after all, if the pubkeys are externalized in some identity tangle.

:bomb: The downside is that if an identity tangle has a lot of pubkeys, say 100 of them, then you have to test each of those 100 when validating msg.sig.

Network identity

This seems okay! It's just a matter of replicating the identity tangle, discovering all the pubkeys, and then you allow all these pubkeys to be used in SHS whenever you're talking with "MTYQM89hvHuiVKaw8Ze7kc".

:bomb: We gotta be careful with chicken-and-egg situations though, like if you are trying to connect with MTYQM89hvHuiVKaw8Ze7kc for the first time ever, how do you replicate that identity tangle if it's only available by connecting to MTYQM89hvHuiVKaw8Ze7kc?

Erasing

Doesn't seem like an issue here because the write access info is not in the interesting feeds, it's externalized in the identity tangle.

main keypairs vs other keypairs

Doesn't seem like problem either, because any pubkey in the identity tangle can create a new feed.

:bomb: Sig validation

Signature validation performance seems to be a problem in this design. We might have to find ways of making it faster, like perhaps having both msg.metadata.tangleIdentityRootMsgHash (ignore the name choice) and msg.metadata.pubkey, so you could just use msg.metadata.pubkey (formerly who) to validate the msg.sig and then use msg.metadata.tangleIdentityRootMsgHash to check that the pubkey belongs to the identity tangle.

ahdinosaur commented 1 year ago

@staltz, did you look at my example messages in https://github.com/ssbc/ssb2-discussion-forum/issues/24#issuecomment-1545730562? i'm curious what you think, because you proposed a slightly different approach and i'm wondering if that was intentional. i feel each message should be signed with a specific device pubkey (i rename who to device), rather than the who be the generalized identity tangle hash. the other important thing is for each non-identity message to reference the identity tangle, so they point to the latest "identity state" message.

staltz commented 1 year ago

@ahdinosaur i did see it, just didn't have time to comment everything. The example was useful!

I see what you were going for, but the msg.metadata.tangles doesn't work like that. Semantically, having that tangle field with the identity tangle in a post msg means that the post msg belongs to the identity tangle, which is not what we want. But indeed, you're right that it would have to reference somehow the identity tangle root and the known tips so to inform what is the state of the identity tangle when the post msg was created.

We can come up with a new syntax for that. I feel like we are stretching the meanings of these fields, which means we are probably going to need a slightly different feed format design. For now I think it's better to think about all these things in the abstract and then look for a feed format design that supports it.

gpicron commented 1 year ago

I don't have much time currently so I will not be able to go in details and/or participate actively to the discussion. Meanwhile, I have yet exposed the main ideas of what I working on in previous threads. And I will try to summarize. If it fit you need, we can see how to join our efforts.

Plan is to have separate app for managing identities and use SSB as opportunistic synchronization framework. Thinks a bit like a Password Manager app, but that acts as Decentralized Authority for your own identities. The main goal is: user can have several disconnected identities, it is hard to link 2 identities, offline caps.

For instance, I would like it to be able to propose an alternative that is as user friendly to Google, Github, etc. single sign-on but that prevent the capability to link activity of a user on various website and with the level of safety of a multi-factor authentication.

For the general approach, the user has a MasterPassword choosen by himself but must pass a minimal level of entropy (I use zxcvbn from DropBox with some tweaks to evaluate the password strengh).

This password is never stored anywhere. It is used to generate "master seed" of 512 bits argon2d with "sensitive" settings (require 256MB and 3 rounds) to make it hard to bruteforce (it takes about 60 seconds to generate the seed on a powerful computer). That seed is a kind of TOTP that change every year.

From that master seed, I derive Identities (using a context string and the year of creation and a random padding generated from the masterseed ). An identity is a stable identifier over time, it is a byte string. (I use encryption here, because I need to be able to get the context string from the identifier), the encryption key is derived from the masterseed.

The identity byte string is used to generate a identity seed (again argon).

So the DB that shared by all my devices is only the list of {idenditybyteString, {field, counter})) encrypted with a key derived from the master seed. Purpose is that even if it is decrypted, it does not reveal much

Then that seed is used for various generations of data/password.

For instance, I personally use it as password manager so far. It is pure command line and lack a lot of features (sync is not using SSB yet).

For servers that I manage, have the following pattern:

For all these, I mainly use argon and blake

My tool tell me to rotate accounts password and key when I use it (every month)

One interesting aspect is that even without the db, I can in a few trials recover a password if I remember my master password, the context string (the server name in my db) and more or less the last year-month I refreshed my password/keys.
So I install the tool on most of my servers without any db. (the db is actually only on my personal computer and office computer)

I have the similar for websites, just that currently this is not so good solution as most of the time they requires a email to check, so the link between identities is there and the effort to create new address each is large. For that, I'm thinking about something using P2P collaboration scheme to anonymize email addresses (think Tor but bridging SMTP to SSB private messages.) That would provide to services the valid goals (multifactor authentication, privilegied communication push channel to their users) but prevent identity linking with minimal impact on their current process. Another option is simply that services have a bridge SMTP/SSB in their infra but that requires their collaboration and wish to preserve their users from identity linking practices.

The main point is the Master Password. If someone get access to it... There is no magic. I'm thinking at a schem using one-use revocation key to keep on a paper in a safe and that would permit one knowing both the Master password and that key to inform all servers/services via SSB that the identities was stolen and must be revoked. (and then process to recreate one and recover accounts). I'm currently digging papers to find a commitment scheme that could permit that.

staltz commented 1 year ago

@ahdinosaur I started sketching "v2" of the feed format in light of the ideas we had here. (And as a good moment to rename some fields)

Seems like we could make a "group" a primitive in the protocol. An "account" (or "identity") is just a group of devices (or a group of public keys). Similarly, a "community" / "club" / "team" is just a group of public keys. This should also pave the way for private groups, since (in theory) this is just a matter of a group's tangle msgs being encrypted. Groups could be a built-in PPPPP feature!

So to make it a bit more clear:

With this, the main realization I had is that we could replace the msg.metadata.who with msg.metadata.group. Because e.g. to make it possible that any of your devices can start a feed, we need the feed root to be predictable. Other peers should need to know only the Group ID, not the pubkey which started the feed.

The rough structure would be:

const msg = {
  data,
  metadata,
  pubkey, // former `who`
  sig
}

Notice that the pubkey is outside of the metadata! Instead, we have the msg.metadata.group as the identifier. This has some very interesting implications. It means two of your devices can author (accidentally or not) the same msg, and it'll have the same Msg ID! For instance, two of your devices could have simultaneously started the post feed, but this should still yield the same msg.data (which is empty) and msg.metadata, hence the same Msg ID! When replicating, it doesn't matter from which pubkey I got the message from. Any one of those pubkeys are equally authoritative!

:bomb: However. My main dilemma right now is: Does a msg include the just the group ID or does it include references to the group's current "state"?

ahdinosaur commented 1 year ago

thanks for all this @staltz, looks great. :relaxed:

If including the group's current state:

  • CON: remote peers cannot derive the tangle ID for Alice's posts

does the feed root need to reference the identity state? seems that could be a special case.

i will admit, the "deterministically predictable feed root" and "pubkey outside the metadata" feels wrong to me, but also i'm susceptible to wanting things to be done the "correct" way and terrible at cutting corners. i'm happy to try to accept these as-is and see how things go.

so in pursuit of the "correct" way i'm partial to including the group's current state in every message (except a feed root message). i do agree that identity tangles make sense to be replicated first, before replicating content tangles.

but of course my mind wonders what would happen if we flipped the script, what if the identity tangle "announced" the creation of feed tangles, where those msg IDs became the tangle IDs. no more need for determistically predictable feed roots. the CON is the identity tangle is larger, also i'm not sure if this is the same as the meta-feeds approach that rubbed you the wrong way.

ahdinosaur commented 1 year ago

I see what you were going for, but the msg.metadata.tangles doesn't work like that. Semantically, having that tangle field with the identity tangle in a post msg means that the post msg belongs to the identity tangle, which is not what we want. But indeed, you're right that it would have to reference somehow the identity tangle root and the known tips so to inform what is the state of the identity tangle when the post msg was created.

i had a wonder about our tangles object and was wondering if they should be more semantic.

for example:

{
  "data": {
    "text": "yo i heard you like tangles!",
  },
  "metadata": {
    "data_hash": "K8tzhL8Sewr3mVr1dpaYa2", 
    "data_size": 206,
    "data_type": "post",
    "tangles": {
      "identity": {
        "root": "QwrP7DAMHhHe71Qf87tXBf",
        "depth": 2,
        "prev": [
          "72MK9ETRNGKm7Jh8ryJswM"
        ]
      },
      "feed": {
        "root": "SRGfAAnxTzjN6mEDJ542hf",
        "depth": 2,
        "prev": [
          "NWmZDGa64kiY2cDaM3u8c2"
        ]
      },
      "thread": {
        "root": "RG3uXAiKFiwGjYQs6s4Adr",
        "depth": 2,
        "prev": [
          "otYDrKTZZ1ZgDVgTBeBZ6v", 
          "5jdPWyyniKoeukdVoZVxUA", 
        ]
      }
    },
    "pubkey": "8kBhDXpZajdBRFLq8zophqCzbFsFzvuwBGoWj7TU9Loe",
    "v": 1337
  },
  "sig": "5abJdD6RRCsWXKJLaEKRhUb1HKh4aKPFteFRgUBfyJD4cFzo5MVaMdWbwM2CfpNRFSjR9NkczRL2LcSyQVThYnRr"
}

then we could treat a reference to the identity tangle differently.

i had a chat with @mixmix and he mentioned semantics being important for how he uses tangles. i notice other specs (like meta feeds and group exclusion) using tangles have semantic references.

that being said, you could certainly "derive" the "meaning" of the tangle with enough traversing, but is there a benefit to not including the "meaning" in the message?

staltz commented 1 year ago

i will admit, the "deterministically predictable feed root" and "pubkey outside the metadata" feels wrong to me

It is possible that pubkey outside metadata would lead to new problems, so it's important to be careful with this new design.

but of course my mind wonders what would happen if we flipped the script, what if the identity tangle "announced" the creation of feed tangles, where those msg IDs became the tangle IDs.

Yeah, I wouldn't rule out that tactic. It might not be as bad as ssb-meta-feeds. I'll consider it.

i had a chat with @mixmix and he mentioned semantics being important for how he uses tangles.

Yes Mix had raised the same concern with me when I showed him this tangles design. But PPPPP tangles are doing something different to SSB tangles, and as soon as you introduce semantic names, it makes it possible to have two different roots for the same name, and this violates one of the tangle constraints, which is: only one root per tangle. This constraint is actually very important for backlink validation and sliced replication. The semantic names also make it impossible for a msg to be part of two different "threads" or two different "feeds" because there is just one name.

In short, PPPPP tangles are a way of grouping messages for the purpose of replication of a shared data structure (well, might as well just call this a CRDT because it's a replicated data type, RDT). In SSB, tangles are not replicateable, they are just ways of logically grouping msgs so to have consensus on their causal order. This is why PPPPP tangles need more "machine-friendly" identifiers than human-friendly identifiers. Similar to how in SSB you replicate @FCX/tsDLpubCPKKfIrw4gc+SQkHcaD17s7GI6i/ziWY=.ed25519 (machine-friendly) not "alice" (human-friendly), and the system would have severe bugs if you would change replication to be centered around human-friendly names.

staltz commented 1 year ago

does the feed root need to reference the identity state? seems that could be a special case.

Actually, this was a great suggestion, and it unblocked me with this design. So here I present what seems like it will work quite well for multi-device use cases (and ... multi-person feeds!):

Feed v2

Alice's identity

const msg0 = {
  data: {
    add: DESKTOP_PUBKEY,
  },
  metadata: {
    dataHash: '1800a9st',
    dataSize: 32,
    group: null,
    groupTips: null,
    tangles: {},
    type: 'identity',
    v: 2,
  },
  pubkey: DESKTOP_PUBKEY,
  sig,
}

const msg1 = {
  data: {
    add: PHONE_PUBKEY,
  },
  metadata: {
    dataHash: 'Dhc810cI1',
    dataSize: 32,
    group: null,
    groupTips: null,
    tangles: {
      [IDENTITY_MSG0_HASH]: {
        depth: 1,
        prev: [IDENTITY_MSG0_HASH],
      }
    },
    type: 'identity',
    v: 2,
  },
  pubkey: DESKTOP_PUBKEY,
  sig,
}

Alice's posts feed

const msg0 = {
  data: null,
  metadata: {
    dataHash: null,
    dataSize: 0,
    group: IDENTITY_MSG0_HASH,
    groupTips: null,
    tangles: {},
    type: 'post',
    v: 2,
  },
  pubkey: DESKTOP_PUBKEY,
  sig,
}

const msg1 = {
  data: {
    text: 'Hello world',
  },
  metadata: {
    dataHash: 'Cfo91ico5',
    dataSize: 10,
    group: IDENTITY_MSG0_HASH,
    groupTips: [IDENTITY_MSG1_HASH],
    tangles: {
      [POST_MSG0_HASH]: {
        depth: 1,
        prev: [POST_MSG0_HASH],
      }
    },
    type: 'post',
    v: 2,
  },
  pubkey: DESKTOP_PUBKEY,
  sig,
}

Commentary

group and groupTips are always null in identity tangle msgs. I don't know, should these fields be omitted in identity tangle msgs? It isn't pretty.

This new feed design is not the most pretty syntactically (group and groupTips are basically defining a tangle, but not in the tangles field), but prettiness is not that important. The reason why group shouldn't be inside tangles is because we don't need the depth (depth only helps when creating a new msg in the tangle, and in this case we can't and should not add msgs to the identity tangle while we are publishing on a common feed), and we are not declaring this msg to belong to the identity tangle. We are only referring to the state of the identity tangle.

Perhaps with more bikeshedding we can make this feed format pretty to the human eyes, but at present this v2 design seems like green light for me to build some prototypes and see how it works in the wild.

Solves our problems?

Here's how v2 fairs with the originally mentioned problems:

:tada:

staltz commented 1 year ago

Small problem came up during implementation:

As per the current design, a pubkey can only start a group once. If they start another group, then it's going to end up with the same group ID. So we need to add some kind of nonce to the group tangle's (I'm renaming it from identity tangle to group tangle) root msg.

UPDATE: oh this is easy, just add msg.data.nonce. That should cause the msg.metadata.dataHash to always be unique when starting a group.

staltz commented 1 year ago

Open question

How would encryption work for "groups"?

Say you follow someone known by the group ID XKKmEBmqKGa5twQ2HNSk7t, how do you encrypt a private message so only that person (i.e. that group of devices) can decrypt it?

staltz commented 1 year ago

:bulb: Thought: rename msg.metadata.tangles[tangleId].prev to .....tips. This would align the nomenclature with groupTips (i considered groupPrev as a name) but I think overall "tips" is easier to understand than "prev" from an implementation perspective. You're supposed to just put the "tips" (the extremities) of the DAG into this field, you're not supposed to think about what came "previously" (since technically the whole DAG came previously).

ahdinosaur commented 1 year ago

Open question

How would encryption work for "groups"?

Say you follow someone known by the group ID XKKmEBmqKGa5twQ2HNSk7t, how do you encrypt a private message so only that person (i.e. that group of devices) can decrypt it?

in my mind, the identity tangle (XKKmEBmqKGa5twQ2HNSk7t) would include a message that advertises a public key i can use for encrypting private messages. then the same "if a message is encrypted, try to decrypt with your available keys" as SSB.

i'm no expert but i reckon the SSB group specs have explored where we'd need to go:

in some of those specs i think private keys are sent to those added to the group using good ol' secret box encryption. so we'd need to either find a new way to distribute or derive shared keys amongst the group members, or we use our own form of secret box for this specific purpose.

ahdinosaur commented 1 year ago

bulb Thought: rename msg.metadata.tangles[tangleId].prev to .....tips. This would align the nomenclature with groupTips (i considered groupPrev as a name) but I think overall "tips" is easier to understand than "prev" from an implementation perspective. You're supposed to just put the "tips" (the extremities) of the DAG into this field, you're not supposed to think about what came "previously" (since technically the whole DAG came previously).

i was going to suggest groupTips be renamed to groupPrev so the vocab aligned, so yeah i support prev being renamed to tips :+1:

staltz commented 1 year ago

in my mind, the identity tangle (XKKmEBmqKGa5twQ2HNSk7t) would include a message that advertises a public key i can use for encrypting private messages. then the same "if a message is encrypted, try to decrypt with your available keys" as SSB.

Yeah, you're right, this isn't after all that hard. A device in the group tangle can announce a public key exclusively used for private messaging (not used for signing, etc) and then this device will share the keypair with other devices in the group. Yes, this has the vuln that whichever device leaks the private messaging keypair will compromise all the private messages, but this is actually the same threat model as SSB private groups (where there is a symmetric key shared to all members). I think it could work.

The only (minor) problem is that the group tangle will now publish two different kinds of information: a "Set" of pubkeys recognized for signing messages, and another "Set" of pubkeys recognized for private messaging. Having two in one tangle makes it a bit harder to perform PREDSL pruning, but maybe there is a way out of this. And maybe the group tangle won't need a lot of pruning after all (think, 2 devices is the most common case).

ahdinosaur commented 1 year ago

Having two in one tangle makes it a bit harder to perform PREDSL pruning, but maybe there is a way out of this.

okay, then what if these were two separate tangles? a tangle for the "Set" of pubkeys recognized for signing messages, and another tangle for the "Set" of pubkeys recognized for private messaging.

no reason comes to mind of why they can't be separate, the identity (pubkeys for signing messages) tangle is special because it affects permissions (the capability to write a message as the group), a slide-into-my-DMs (pubkeys for private messaging) tangle seems not special. the same edge cases that would apply (e.g. you were removed from a group but continue to publish new messages that point to the groupTips when you were still in the group) are true of any other tangle.

:woman_shrugging:

nonlinear commented 1 year ago

@staltz It's great you folks are finally tackling the multidevice issue... once ready, it opens up for a new era of app experimentation. it's so worth it.

I do have a security issue that arises with successful multidevice. it's probably solved somewhere, but critical. but first, some conceptual tools to map the problem. we have:

For now, identity and device are one and the same. multidevice goal is to decouple them. Correct?

Security issue: each device has of course identity attached to it. the more devices, the more vulnerable for attacks (another self claiming your identity, with no authority to clear the issue). Since we're serverless, we can't rely on passcodes, passwords, verifications to prove who is whom. Any attempt just moves problem up, not solving it.

Suggestions:

  1. the more I think about it, I believe we need an "activity" measurement... some way to detect activity, and if below a certain threshold, we deactivate or reduce autonomy of device.
  2. app attaches to device security features, nagging user if not using it.
  3. a master device that trumps all others. Probably based on activity so user doesn't have to tell system.

user scenario: successful lawyer is active user in many communities... they bless devices with their identity with abandon. they have many old androids lying around in drawers on their many properties... to ask them to do an inventory of all their devices is... almost impossible.

user scenario, moar: lawyer is admin in many groups. having his identity stolen spells disaster to them and many others. how to help them?

staltz commented 1 year ago

okay, then what if these were two separate tangles? a tangle for the "Set" of pubkeys recognized for signing messages, and another tangle for the "Set" of pubkeys recognized for private messaging.

@ahdinosaur You're right, it seems that we can just publish these separately. Can still bikeshed the naming of it, but it would most likely just be a "subfeed".

groupTips versus prev

I just realized that semantically we shouldn't rename these. prev (in tangles) contains tips plus lipmaa references, so we can't name it tips because lipmaa references aren't tips. We could rename groupTips => groupPrev, but the semantics here isn't right, because groupTips should not contain lipmaa references.

I guess they are different after all. :shrug:


For now, identity and device are one and the same. multidevice goal is to decouple them. Correct?

@nonlinear Correct :)

Security issue: each device has of course identity attached to it. the more devices, the more vulnerable for attacks (another self claiming your identity, with no authority to clear the issue).

We started this thread stating this security issue, and now we solved it. Devices will not share keys with each other, so if one device is compromised, only that device's keys are compromised. Your "self" is a group of devices, so the other devices could "downvote" or "remove" (whatever mechanism we come up with) the compromised device, meaning that the compromised device would be kicked out of your "self".

staltz commented 1 year ago

Alright, here's the code!

arj03 commented 1 year ago

This is looking really good. It seems that using the tips it should be possible to reason about having a group a devices (self) being part of a multi people group.

staltz commented 1 year ago

@arj03 thanks!

You mean making a nested group? (A group that contains another group)

I guess one way to implement that is to add a group ID in msg.data.add in the identity tangle. The data part could tell what is the type of the thing being added. msg.data.type = 'ed25519' for the normal case and msg.data.type = 'group' for nested groups. But we'd also have to encode the groupTips here...

arj03 commented 1 year ago

@staltz selfie thanks

Maybe you can encode the removal policy in the identity init. That way an identify for me could have the master key removal policy. While a group of our two multi device identities could have the one we can up with for groups.

staltz commented 1 year ago

@arj03 Yes, something along those lines. For me this use case is not super important to design right now, but it's good to have an idea of how it could be designed.

I also think there could be removal policy in the identity root.

Another shower thought is: while previously we discussed with @ahdinosaur about having one "identity tangle" and another "private messaging keys tangle", I think we could revert to having a single tangle that defines everything about this "identity"/person/entity and then have custom prune algorithms for that (if needed at all! I think in the short term we might not even need pruning for this tangle, since it's going to be small and not often changing).

So the identity tangle msgs would have msg.data as either:

This should also neatly allow for more key schemes in the future. Also, may consider if the network identity is the same as the signing pubkey or whether we should have separate network identity keys. I would default to reusing a keypair for both signing and network identity purposes, just need to come up with a convention for type that makes that explicit, perhaps {add: $PUBKEY, type: 'sign-and-shs-ed25519', nonce?: string}.

staltz commented 1 year ago

PS: another thing in my mind is trying to settle on a name, either "identity" or "group", but having both names is going to be confusing. I think I prefer "identity" because it's less likely to be ambiguous with other concepts. And the "identity tangle" makes more sense than a "group tangle".

staltz commented 1 year ago

PS2: one thing I realized that this current design allows is public tangles. This is basically a feed tangle where the feed root has group = null. This creates a root where the only non-null value is msg.metadata.type. One application is to produce a "pubs registry" tangle where anyone can publish to, and anyone could know this tangle's ID and replicate it.

nichoth commented 1 year ago

(I hope this isn't adding noise.) Thanks for working on this format BTW. It overlaps a little bit with work I'm doing.


either "identity" or "group", but having both names is going to be confusing.

This is something that has confused me honestly. My naïve understanding is that an identity is a single person (with multiple devices), and a group is a collection of people. So they are neatly supersets — group is a collection of identities, and identity is a collection of devices.

ahdinosaur commented 1 year ago

another thing in my mind is trying to settle on a name, either "identity" or "group", but having both names is going to be confusing

i'm :+1: on "identity", but another option is "agent", which is what Value Flows uses to describe an individual or a group.

ahdinosaur commented 1 year ago

So the identity tangle msgs would have msg.data as either:

i love what you're thinking, and i'm sorry but i have to give some bikesheddy feedback: can we design msg.data types such that unions (when there are multiple different types of valid msg.data contents for a single msg.metadata.type) are tagged unions? is much easier to reason about and implement in TypeScript, Rust, etc. :purple_heart:

(how to parse enums using Rust's serde)

so for example:

in this case there are multiple 2 layers of union: action types -> keys types.

anyways, rant over, cheers :blush:

Powersource commented 1 year ago

How would encryption work for "groups"?

Say you follow someone known by the group ID XKKmEBmqKGa5twQ2HNSk7t, how do you encrypt a private message so only that person (i.e. that group of devices) can decrypt it?

maybe worth looking at how tribes1 po-boxes worked?

staltz commented 1 year ago

@mixmix Suggested for identity tangles: to add a "consent" system similar to https://github.com/ssbc/fusion-identity-spec because you don't want anyone to randomly add your pubkeys to nazi identity tangles. So the new device can sign an attestation that "yes i want to belong to identity tangle known by the ID XYZABC" when the old device wants to publish the msg on the identity tangle.

staltz commented 1 year ago

@ahdinosaur I'm finally at the point where I'm bikeshedding/designing the identity tangle data. What do you think about the following?

Examples of msg.data:

Detailed types

interface Msg {
  data: IdentityData
  metadata: {
    dataHash: ContentHash
    dataSize: number
    identity: 'self' // MUST be the string 'self'
    identityTips: null // MUST be null
    tangles: {
      [identityTangleId: string]: {
        depth: number // maximum distance (positive integer) from this msg to the root
        prev: Array<MsgHash> // list of msg hashes of existing msgs, unique set and ordered alphabetically
      }
    }
    domain: string // alphanumeric string, at least 3 chars, max 100 chars
    v: 2
  }
  pubkey: Pubkey
  sig: Signature
}

type IdentityData =
  | { action: 'add' add: IdentityAdd }
  | { action: 'del' del: IdentityDel }

type IdentityAdd = {
  key: Key
  nonce?: string // nonce required only on the identity tangle's root
  consent?: string // base58 encoded signature of the string `:identity-add:<ID>` where `<ID>` is the identity's ID, required only on non-root msgs
}

type IdentityDel = {
  key: Key
}

type Key =
  | {
      purpose: 'sig' // digital signatures
      algorithm: 'ed25519' // libsodium crypto_sign_detached
      bytes: string // base58 encoded string for the public key being added
    }
  | {
      purpose: 'subidentity'
      algorithm: 'tangle' // PPPPP tangle
      bytes: string // subidentity ID
    }
  | {
      // WIP!!
      purpose: 'box' // asymmetric encryption
      algorithm: 'x25519-xsalsa20-poly1305' // libsodium crypto_box_easy
      bytes: string // base58 encoded string of the public key
    }
staltz commented 1 year ago

Dang, now I really feel like renaming "identity" to "account". I think it better reflects what it is, and on the UI level we will be talking about "accounts" anyway, not "identities". Further, "identity ID" is just really weird.

Powersource commented 1 year ago

maybe relevant for bikeshed https://sunbeam.city/@powersource/110768411610769697