Soft-logout should be the default on 401 errors.

ara4n commented 3 years ago

Currently soft-logout (https://matrix.org/docs/spec/client_server/r0.6.1#soft-logout) defaults to false on 401 errors, for backwards compatibility. This means that if someone accidentally puts their matrix CS API behind HTTP auth (as presumably per https://news.ycombinator.com/item?id=27907696), then clients may see the 401 and hard logout, destroying local user data. Instead, I suggest they should only explicitly hard logout if soft-logout: false. (Alternatively, hard-logout: true).

mnemnion commented 3 years ago

If I may venture a further opinion, no server action should be allowed to destroy user data.

This is an obvious security vulnerability, not to mention undermining the very purpose of a client, which is to serve the user. Speaking as a user, I never want my data deleted without explicitly requesting it, and then confirming that I actually want it.

kragen commented 3 years ago

So, to clarify, I think this is a serious security vulnerability in the Element client that is (apparently?) required by the spec. As an ethical matter, the client should be a user agent, acting on behalf of the user, and in particular safeguarding the integrity and confidentiality of the user's data against anything the server can do.† That is, there should be no possible course of action by the server that destroys local user data, especially encryption keys, without the user's permission. Even if you disagree on ethical grounds‡, it should be clear that I want to use a client that is designed to safeguard my data, not one that peremptorily wipes it in response to some kind of server misbehavior, whether accidental or intentional.

Obviously this is not the way SPAs on the web work; even though it's running in an environment under user control, an SPA (like Element on the web) is a server agent, into which the server can inject new executable code at any time and routinely does. The user can also inject new executable code into it at any time, so neither the server nor the user can trust it to safeguard anything from the other party; it is only safe for cases where there is no conflict of interest between the server and the user, and neither the server nor the browser is malfunctioning. That's why doing cryptography in the browser instead of on the server is at best of minimal value, even today, just as it was in 2010, 2011, and 2013.

Since Element/Riot started out as an SPA, it makes sense that it takes this server-agent posture. But I am not running Element as an SPA; I have it installed on my phone via F-Droid. It should behave as a user agent: a program that acts (agere) on behalf of the user. Other Matrix clients such as Gomuks have never been SPAs.

The sequence of events from the user experience perspective

I had active chat sessions that were only live on Element on my phone which had important information in them, like the address I needed to go to later that day. Some of them were end-to-end encrypted.
Our company Matrix server broke, but I didn't know this. (I just saw that Element was failing to connect to it.) Our extremely competent but overstressed sysadmin began fixing it, but I didn't know this yet.
Element on my phone suddenly stopped showing me my chats; instead, it showed me a login screen, evidently having forgotten my password. There was no user interface option visible to see the chats. My login attempts with what I thought were the password (yes, I know, I should be using a password manager, but at least I'm using a unique password) failed.
I tried logging in on the web interface. The web interface loaded but my password failed authentication. At this point, I started to wonder if I'd been fired without notice, and they just hadn't yet gotten around to disabling my email. This is, to be clear, not the fault of anything about Matrix; it would have been helpful if our sysadmin had sent out an email to inform us that the server was broken, but I can't really blame her — she was very busy at that moment! I suspect that at this point she had reinstalled the server and had not yet restored the databases from backup.
I found out from another coworker that our Matrix server was down.
I had to guess at the address where I had to go that afternoon, which I had thought I had safely on my phone but which Element had apparently wiped. Fortunately, I remembered enough of the address that I made my meeting.
After the Matrix server was back up, I was able to log in (and it turned out I had the correct password all along), using Gomuks, which I had not been using for the previous few weeks. Gomuks was not able to authenticate with any existing session because Element was not logged in, so it could not decrypt any of the end-to-end-encrypted chats.
I logged in with Element on my phone. But evidently Element had erased my end-to-end encryption keys, because I was no longer able to decrypt the past months of conversations I had participated in. I lost months of design discussions and meeting notes about the project we're working on.
Some of my coworkers had not been so persistent about trying to log back in to the server during the time when it was malfunctioning, so they did have logs of the conversations, and they were kind enough to share them with me. However, Element is not very good at copying and pasting long chunks of past conversations; evidently you can only copy and paste a screenful at a time. So it was too much trouble to get the full logs.

The sequence of events from the protocol perspective

My client, Alice, was communicating with clients Bob and Charlie over the Matrix protocol, with the messages being relayed through the Matrix server Mungojerrie, and some of them authenticated and encrypted through end-to-end encryption between Alice, Bob, and Charlie.
Mungojerrie broke and sent Alice some crap, then stopped relaying messages entirely for a while.
Alice responded to this crap by deleting all of her previous records of conversations with clients Bob and Charlie, and also deleting the encryption keys that Bob and Charlie were using to authenticate her identity.
Mungojerrie got fixed, started relaying messages again and providing Alice with historical messages.

Step 3 should never happen, regardless of the nature of Mungojerrie's malfunction in step 2, and regardless of what specific crap Mungojerrie sent.

† Except to the extent that it's necessary to compromise these properties to get the system to work. For example, you might want the server not to know who you're talking to, when you're actively using the client, or where you're connecting from, but these pose significant practical difficulties in implementation and probably other major tradeoffs, like Pond's unpredictable and high latency, and potential usability problems. But there is, I think, no such difficulty in this case.

‡ It is reasonable to disagree with such a general ethical statement. For example, if I think there's a reasonable chance that my abusive stalker ex-boyfriend has just stolen my phone and knows how to unlock it, remotely erasing my chat history from that phone, and revoking my keys, is a perfectly ethical thing to do, and it's ethical to provide a facility to do that. Such information security decisions always involve a balance of interests and risks. However, the choice to use a Matrix server to store and forward messages should not implicitly also delegate to that server the authority to wipe the client's data. In fact, it's not clear that the Matrix client itself is the place to put that remote-wipe functionality; for people who are in that sort of high-risk situation, a separate remote-wipe phone app that uninstalls the Matrix client entirely is probably a better option.

matrix-org / matrix-spec

Soft-logout should be the default on 401 errors. #864

The sequence of events from the user experience perspective

The sequence of events from the protocol perspective