ssbc / rooms2

Design doc for the next edition of SSB Room servers
https://ssbc.github.io/rooms2
21 stars 4 forks source link

Spurious ephemeral contact #19

Open KyleMaas opened 3 years ago

KyleMaas commented 3 years ago

Continuing the conversation from https://github.com/arj03/ssb-browser-demo/issues/186

This was actually a very interesting case. It surfaced a bunch of assumptions about how SSB works with random connections both before and going forward. The sum up: this is by design in EBT, and rooms 2.0 will have the same model of tunneled authentication requires mutual follows. The really important thing here is to make that clear to the user I think. That would solve a lot of problems. It is actually a "hack" that it works right now for non-EBT transfers and we might tighten that up in the future so you can't do that.

Originally posted by @arj03 in https://github.com/arj03/ssb-browser-demo/issues/186#issuecomment-783251401

Hm. That's...troubling. That means that, for onboarding new users to SSB, rooms can never truly obsolete pubs. Otherwise it's forcibly invite-only and can never be made accessible to the general public. And there would be no way for potential users to look at the content posted to SSB to decide if it was worth joining.

The same would even be true for folks who had to reintegrate with the network after something like forking their feed. Unless they had an outside-of-SSB channel for communicating with someone that they should follow them, they'd be locked out. Which means that SSB can't realistically run in a vacuum like it otherwise could.

It seems like a very strange decision for something that's otherwise very well-designed for the free and open flow of information.

Originally posted by @KyleMaas in https://github.com/arj03/ssb-browser-demo/issues/186#issuecomment-783301074

Thinking about this more, requiring mutual follow also breaks the use case of a person travelling off the grid and getting updates from anyone and everyone they come in contact with. The ability to support spurious ephemeral connectivity is a huge plus for SSB - otherwise, there's far less reason to do chained message signing (instead of an independent per-message signature), since you have to establish a "trust" relationship with anyone you want to get messages from anyway.

Originally posted by @KyleMaas in https://github.com/arj03/ssb-browser-demo/issues/186#issuecomment-783336623

I think it would be best to have this discussion in the rooms2 repo. I'm sure Henry and André have input on this.

Originally posted by @arj03 in https://github.com/arj03/ssb-browser-demo/issues/186#issuecomment-783338619

The gist of this is: SSB is designed for off-grid use with only occasional contact with the outside world. By requiring mutual follow to replicate anything, it means that there is no such thing as a spurious read-only contact when coming off-grid. It means you can only contact the network through people you have a trust relationship with. This may be fine under normal circumstances but breaks down in several possible scenarios:

  1. You are travelling the world and want to get updates on what's going on whenever you stop (seaport, airport, coffee shop, etc.). Because replication would require a mutual follow, you would have to establish a trust relationship with another person to be able to replicate, even if you both follow some of the same people.
  2. You and another person both follow a third person, but the other person does not want to replicate your friends' feeds. They may not want to follow you and absorb the baggage that comes along with that, but you want to follow them. Because they don't want to follow you, you cannot replicate their feed (or anyone they follow) directly.
  3. A new user is thinking about joining the network but is unsure. They want to take a peek into the content of the network and see if they like it. They join a room, see people in it, connect to one, and...nothing. They're new, so no one follows them, but in this scenario they can't even see anything on the network. They assume the network is empty/unused and leave.
  4. You accidentally fork your feed and need to start a new feed. You're now in the "new user" scenario. Without some kind of an off-network contact to ask them to follow you, you normally wouldn't be able to post anything that anyone would see. But with a requirement for mutual follow, you can't even see what's going on. You're now dependent on some sort of external (and potentially long-range) communications mechanism.

This also means that rooms cannot truly replace pubs. There will always need to be pubs to onboard new users or people who have had to replace their feed. And these will continue to have a tendency to quickly become massive messes just like open-invite pubs with tons of follows do now. But with a mutual follow requirement, they stay the same "evil" but become even more "necessary".

In short, a "follow" is an implied trust relationship. You trust the other person to not shovel garbage into your database. I see the requirement to establish a mutual trust relationship with a stranger to be able to view publicly-posted messages to be a huge step backward. Why should they have to trust me for me to be able to read? I'm hoping maybe this can start a dialog on how to support these use cases, because I would really hate to lose the ability to safely replicate off-grid from strangers.

staltz commented 3 years ago

Hey @KyleMaas, thanks that you are thinking about these things. I don't have a lot of time now, but I'll try to come back to this thread soon.

The gist of my argument is that SSB works best as a "pull network" (this is a term we've used several times in SSB, you can search for it to read more about it), in other words, an invite-only network. We have attempted doing open invites (see e.g. easy-ssb-pub) and allow any random person to get data from almost anyone, and it back-lashed: brought spam, randos, disinteresting content, abuse vectors, etc. We intently do not want to support such use cases again, and I think we'll operate just fine with a pull network only. I myself initially joined SSB by asking for an invite. New social networks like Clubhouse are also growing solely on an invite system, and it proves it can scale fast. So has Gmail done in the past. Anyway, I firmly believe in the pull network idea.

It doesn't undermine off grid use cases (you can gossip-replicate with other off grid friends, or just mutually follow on-the-spot), and rooms aren't meant to replace pubs (although I would like for rooms to replace pubs for onboarding), and even pubs shouldn't be open invite, most pubs nowadays any way do mutual follows, so they would work with EBT and other assumptions of mutual follows.

I want to invite you to think about pros and cons. For instance

I see the requirement to establish a mutual trust relationship with a stranger to be able to view publicly-posted messages to be a huge step backward.

We have heard from many users in the SSB community that exactly that would be a huge step forwards. One person's easy access to data is another person's easy exposure to abuse.

KyleMaas commented 3 years ago

Thank you for your consideration on this subject.

The gist of my argument is that SSB works best as a "pull network" (this is a term we've used several times in SSB, you can search for it to read more about it), in other words, an invite-only network. We have attempted doing open invites (see e.g. easy-ssb-pub) and allow any random person to get data from almost anyone, and it back-lashed: brought spam, randos, disinteresting content, abuse vectors, etc. We intently do not want to support such use cases again, and I think we'll operate just fine with a pull network only. I myself initially joined SSB by asking for an invite. New social networks like Clubhouse are also growing solely on an invite system, and it proves it can scale fast. So has Gmail done in the past. Anyway, I firmly believe in the pull network idea.

That demonstrates exactly problem I see with requiring mutual follows. I think we're talking about the same problem from two different angles, because I'm arguing against open invites, and what makes me nervous about requiring mutual follows is that I see that requirement as causing a side effect of essentially requiring open invite pubs to exist.

Everyone new to the network should be in a position to not be trusted. You don't know if they're trying to force spam, abuse, etc. onto the network. Likewise, when connecting to a new peer, you may not know if you actually want to follow their content, but would like to replicate anything they have from others who you do follow/trust. But as I understand it, the requirements for Tunneled Authentication mean that you wouldn't even be able to establish a connection with another peer without both of you following each other.

Consider the follow use case from the Tunneled Authentication page:

The user that received the denied connection can then see this fact in their SSB app, and then they can make a conscious choice to either (1) follow the origin peer, or (2) connect to the origin peer (if (3) from the previous paragraph existed), or both.

If you had the ability to replicate from them without following, the SSB client application could show "You have received an incoming connection request from X, here are their last 10 messages that we've temporarily replicated without saving to the database. Do you trust them enough to follow them?" Without untrusted replication, you would have to blindly trust them first to see if they are worth trusting. The trust model is backwards.

With mutual follows required, onboarding a user that you do know (a real-life friend, for example) would have to be done interactively to do it without using an open invite pub (or at least in a bunch of steps with potential long lag time between). The way I read the spec right now, if you were trying to invite a friend, you could not invite them until after they generated their key, because they have to be followed first. Even assuming you blindly trusted incoming connections within a time period and assumed it was probably your friend, their connection would first have to fail so you'd get a record of it and could follow them. Without a secondary interactive communications medium, the new user gets no data and is left to stare at a blank screen until they can get their feed followed by their inviting friend.

I want to invite you to think about pros and cons. For instance

I see the requirement to establish a mutual trust relationship with a stranger to be able to view publicly-posted messages to be a huge step backward.

We have heard from many users in the SSB community that exactly that would be a huge step forwards. One person's easy access to data is another person's easy exposure to abuse.

I would argue that by posting broadcast public messages, if anyone follows you who's connected to a pub, you've already lost that battle to keep your "public" data secret. Or if any pub on the network increases their hop count to replicate further out from them and happens to pull in your content. One way or another, anyone on the network can follow you without your consent or control, and that data can get to them indirectly whether you follow them back or not. So why should rooms 2.0 be different?

It doesn't undermine off grid use cases (you can gossip-replicate with other off grid friends, or just mutually follow on-the-spot), and rooms aren't meant to replace pubs (although I would like for rooms to replace pubs for onboarding), and even pubs shouldn't be open invite, most pubs nowadays any way do mutual follows, so they would work with EBT and other assumptions of mutual follows.

This is exactly what I'm saying. Open-invite pubs are, in my opinion, worse for the network for spam, abuse, etc. than if you limited the amount of trust that was given out to users willy-nilly. By allowing held-at-arms-length replication without trust, the network can be made much more robust for legitimate users (who are far more likely to already have their content widely replicated and thus available to new untrusted peers) without having to break that trust model. In an ideal case, that would nearly eliminate the need for pubs, especially open-invite pubs, which I think nearly everyone agrees are bad for the network. If you could do partial replication with folks you don't trust, you could still replicate data from those who you do trust. You could mistrust the messenger but still get trusted and verified data.

My point is that trust should be earned, not given freely. A mutual follow prerequisite requires trusting someone first to determine if they're trustworthy. It requires giving out more trust more freely than I'm comfortable with. That is the core of my concern.

staltz commented 3 years ago

The way I read the spec right now, if you were trying to invite a friend, you could not invite them until after they generated their key, because they have to be followed first. Even assuming you blindly trusted incoming connections within a time period and assumed it was probably your friend, their connection would first have to fail so you'd get a record of it and could follow them. Without a secondary interactive communications medium, the new user gets no data and is left to stare at a blank screen until they can get their feed followed by their inviting friend.

You can invite them without them having generated a key, but you'll see a connection attempt from "some ID", and you'll have to manually accept or deny that connection. The secondary communications medium is assumed to exist one way or another (to get an invite, for instance), so you can either accept that incoming connection by confirming the ID in a secondary channel, or by just assuming that the timing makes sense.

My point is that trust should be earned, not given freely. A mutual follow prerequisite requires trusting someone first to determine if they're trustworthy. It requires giving out more trust more freely than I'm comfortable with. That is the core of my concern.

I'm not concerned about that and I think a trust-is-given-not-earned approach is commendable, it's more welcoming.

KyleMaas commented 3 years ago

The way I read the spec right now, if you were trying to invite a friend, you could not invite them until after they generated their key, because they have to be followed first. Even assuming you blindly trusted incoming connections within a time period and assumed it was probably your friend, their connection would first have to fail so you'd get a record of it and could follow them. Without a secondary interactive communications medium, the new user gets no data and is left to stare at a blank screen until they can get their feed followed by their inviting friend.

You can invite them without them having generated a key, but you'll see a connection attempt from "some ID", and you'll have to manually accept or deny that connection. The secondary communications medium is assumed to exist one way or another (to get an invite, for instance), so you can either accept that incoming connection by confirming the ID in a secondary channel, or by just assuming that the timing makes sense.

But then it's not an automatically-accepted invite that you can give a friend and have them join at their leisure like you can do with DHT connections, for example. It requires a secondary approval by the invite generator after they've already given their approval, and it requires a connection denial to show the request. It introduces more steps and lag time. A secondary communications medium is required one way or the other, yes, but a secondary interactive communications medium is required with this if you want to avoid the "new user gets a blank screen" outcome. I can't just give a coworker an invite code and say "join when you get home".

My point is that trust should be earned, not given freely. A mutual follow prerequisite requires trusting someone first to determine if they're trustworthy. It requires giving out more trust more freely than I'm comfortable with. That is the core of my concern.

I'm not concerned about that and I think a trust-is-given-not-earned approach is commendable, it's more welcoming.

Which is why open-invite pubs exist.

staltz commented 3 years ago

Which is why open-invite pubs exist.

For people who don't mind exposure to randos, yes. But we'd like to have safer defaults. There's always a tradeoff between privacy and convenience, there's never a clear win that doesn't sacrifice anything.

KyleMaas commented 3 years ago

Which is why open-invite pubs exist.

For people who don't mind exposure to randos, yes. But we'd like to have safer defaults. There's always a tradeoff between privacy and convenience, there's never a clear win that doesn't sacrifice anything.

Indeed. But that's what I mean by people earning trust instead of it being given out freely. And that's why I feel like requiring a mutual follow to connect breaks the trust model. An open-invite pub trusts everyone freely and blindly. If we can do partial replication without fully following/trusting, then there can be varying degrees of trust:

  1. "I don't trust this person at all and don't care what data they have." -> You've probably blocked them. Block replication of anything when they connect.
  2. "I don't trust this person, but I might trust people they follow. Let's ask them for gossiped data." -> Partial replication of mutual follows. Get good (and verified) data you want from an untrusted intermediary.
  3. "I don't trust this person yet, but I'd like to see what they have to say." -> Display partial replication of their latest posts. Replicate mutual follows as well, to get good data you wanted anyway.
  4. "I do trust this person and everyone they follow." -> Follow them and replicate their follows like normal, up to your hop count.

If you require mutual following, there is no middle ground between (1) and (4). And for the most part, that has to be arranged for ahead of time.

I see where you're coming from, but I feel there is room for nuance. There can be a sliding scale of trust, and control can be given to the user with a sane default on the client. It doesn't need to be all-or-nothing extremism to be able to function in a safe and effective manner for everyone involved.

staltz commented 3 years ago

Case (2): do you mean that they are friends of friends? I can see an option where tunneled authentication allows friends and friends of friends, where this can be configured.

Case (3): you may want their data, but they may not want you to have their data.

KyleMaas commented 3 years ago

I very much appreciate the thoughtful discussion on this.

Case (2): do you mean that they are friends of friends? I can see an option where tunneled authentication allows friends and friends of friends, where this can be configured.

Let's say you follow person A, B, and C. You connect to a peer X who doesn't follow you but does follow person A and C. You ask peer X for any information they have on anyone you follow - A, B, or C. They send you what they have for person A and C. In this case, at no point do you replicate peer X's data.

Case (3): you may want their data, but they may not want you to have their data.

And what if they do? I'm saying this should be user-configurable with a client setting, not baked into the protocol.

KyleMaas commented 3 years ago

Let's say you follow person A, B, and C. You connect to a peer X who doesn't follow you but does follow person A and C. You ask peer X for any information they have on anyone you follow - A, B, or C. They send you what they have for person A and C. In this case, at no point do you replicate peer X's data.

One point I should clarify about this use case. I simplified this down to a single hop example for simplicity. But in reality, you could not imply from this from peer X follows person A or person C. They may only have their data because peer X follows person D who follows both person A and person C, and has by extension pulled both into their database. For that matter, they may have followed person D at one time but have now blocked them and just didn't remove A and C's messages from their database. They may have their hops set to 100 and have pulled in every message on earth. So you can't derive information about who peer X follows from this. It just means that their data has been somehow replicated to their database.

staltz commented 3 years ago

And what if they do? I'm saying this should be user-configurable with a client setting, not baked into the protocol.

It's actually quite hard to "bake into the protocol" the design decisions. Most or almost all design decisions can easily be forked in a compatible way.

Specifically here with tunneled authentication, it's a "client protocol", so it's enforced by room clients, not the room server. It's entirely possible for clients to allow connections from any peer, there is little we can do with protocol design to prevent them from doing that.

The reason why this room design doc encourages mutual follows is that allowing open connections from any peer will open the door to several kinds of problems, especially because aliases will make it easy for anyone on the internet to connect with you directly, by giving you a public URL that could be posted anywhere. That's different from rooms 1.0. Not only are there abuse risks, there's also bandwidth and overload problems. Connecting directly means that your peer (potentially a mobile phone) will act as "server", uploading many megabytes worth of data to any peer that asks for it. It could suddenly slow down your phone or even crash it. Think of a use case where Famous Person puts an alias on their webpage, and then thousands of strangers consume that alias, connect with Famous Person over the room, and then each one of those thousands will ask for megabytes worth of data. It's a bad default to allow strangers to connect to you. Not the case with pubs as they are usually built for uptime and can reboot. You don't want your phone's battery dying or your desktop computer crashing.

Let's say you follow person A, B, and C. You connect to a peer X who doesn't follow you but does follow person A and C. You ask peer X for any information they have on anyone you follow - A, B, or C. They send you what they have for person A and C. In this case, at no point do you replicate peer X's data.

If I'm the person initiating the connection with X, I'm not the one enforcing the "mutual connection" requirement, it's X that checks for that. In other words, the peer receiving the connection attempt is the one who checks if a follow exists, and if not, cancels the would-be connection. It's perfectly possible that X would not have such enforcement logic, and would allow anyone to connect.

Again, it's about the defaults and the intended use case.

staltz commented 3 years ago

Also worth mentioning that in Manyverse we did a user experience research on the topic of onboarding (which informed the design of alias URLs in this repo), and we discovered that users consider it important to manually allow/reject connection attempts, for privacy and safety. https://gitlab.com/staltz/manyverse/-/wikis/UX-project:-%22Welcome-to-the-Manyverse%22

KyleMaas commented 3 years ago

And what if they do? I'm saying this should be user-configurable with a client setting, not baked into the protocol.

It's actually quite hard to "bake into the protocol" the design decisions. Most or almost all design decisions can easily be forked in a compatible way.

Specifically here with tunneled authentication, it's a "client protocol", so it's enforced by room clients, not the room server. It's entirely possible for clients to allow connections from any peer, there is little we can do with protocol design to prevent them from doing that.

The reason why this room design doc encourages mutual follows is that allowing open connections from any peer will open the door to several kinds of problems, especially because aliases will make it easy for anyone on the internet to connect with you directly, by giving you a public URL that could be posted anywhere. That's different from rooms 1.0. Not only are there abuse risks, there's also bandwidth and overload problems. Connecting directly means that your peer (potentially a mobile phone) will act as "server", uploading many megabytes worth of data to any peer that asks for it. It could suddenly slow down your phone or even crash it. Think of a use case where Famous Person puts an alias on their webpage, and then thousands of strangers consume that alias, connect with Famous Person over the room, and then each one of those thousands will ask for megabytes worth of data. It's a bad default to allow strangers to connect to you. Not the case with pubs as they are usually built for uptime and can reboot. You don't want your phone's battery dying or your desktop computer crashing.

Fair point. Makes good sense. As long as the user is given the option, and it can be configured without breaking clients connecting to them, it mostly solves my concern. How about a bandwidth limit for replicating to others unless you follow them?

Let's say you follow person A, B, and C. You connect to a peer X who doesn't follow you but does follow person A and C. You ask peer X for any information they have on anyone you follow - A, B, or C. They send you what they have for person A and C. In this case, at no point do you replicate peer X's data.

If I'm the person initiating the connection with X, I'm not the one enforcing the "mutual connection" requirement, it's X that checks for that. In other words, the peer receiving the connection attempt is the one who checks if a follow exists, and if not, cancels the would-be connection. It's perfectly possible that X would not have such enforcement logic, and would allow anyone to connect.

Again, it's about the defaults and the intended use case.

If that's the case, and it's regularly tested so that it can be done without clients crashing, then I could see this working.

Looking through the UX project documents, one thing that both onboarding use cases expressed was the desire to download and receive a link they could use later. This is great, and is one of the concerns I had above. Maybe I don't understand this, but if mutual following is enforced, I at least see a need for a non-authenticated means of connection to be able to redeem an invite code.

Other use cases from case studies that I can see breaking if mutual follow is enforced:

  1. "if they follow a festival that he likes too" -> Can't find out about this if you can't look at that user's public posts, which you wouldn't see unless you already follow that person.
  2. "if they like similar posts as he would" -> Again, can't do this if you can't see their public vote-type posts.
  3. "if their profile picture is attractive" -> No way to know unless you see their public posts.
  4. "if their bio lists that they work in a good position for his network" -> Can't pull about messages, so no bio or name available.
  5. "if they have shared attractive pictures or pictures attaining to interesting cultural events or people on their profile page" -> Same problem. No public information until both you follow them and they approve you to follow them.
  6. "interests that she cares about" -> No way to know this prior to following.
  7. "if they follow other interesting people or interesting friends of hers" -> No way to know this.
  8. "if their bio lists mutual interests" -> Can't pull a bio.
  9. "If she feels safe because they have at least shared about 5 things in their profile page which say something about who they are and what they like" -> No way to determine this.
  10. "if they not only share pictures about friends but interesting or artsy things" -> Again, inaccessible.

Which means 10/12 of their cases where they would like to follow others during onboarding are potentially blocked by a default setting if mutual follow enforcement is the default.

It also breaks "Social tiers" (3) on page 10, since it's exceedingly unlikely that a "famous person" or anyone in the Dunbar's 500+ is going to follow everyone who wants to follow them.

Also breaks "(Semi-)public organizations" on page 14, since for example my local food pantry could not ask for help or request a type of food unless they also mutually followed everyone who might be able to help them.

Is my local food pantry going run their own pub that everyone has to individually join, with the static IP requirement, maintenance, and moderation that implies? Almost certainly not. But if they're running a client like Patchwork on a computer which has a decent internet connection, it gives them the ability to run a public profile with essentially zero additional technical expertise and without requiring users to join an open-invite pub somewhere to be able to connect with them.

So, I guess where I'm at is that I would love to see provisions for clients to have several settings:

  1. "Let unauthenticated connections read my public feed" -> I would prefer to see all clients default this to "on" but let it be turned off, since disabling this breaks an awful lot of common expectations.
  2. "Let unauthenticated connections replicate other feeds I hold data for" -> The altruistic option. It makes sense to me for mobile clients default to "off" for the reasons you stated, but for the sake of network resiliency, I'd much prefer to see desktop clients default to "on".
  3. Bandwidth limit for people you follow.
  4. Bandwidth limit for people you don't follow (only available if (1) or (2) are on).

But present the options to the user (even if it's an Advanced setting) and let them decide.