matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
188 stars 94 forks source link

Media in the content repo is not authed #870

Closed kethinov closed 3 months ago

kethinov commented 7 years ago

Example, this was shared in a private 3 person chat, but anyone can view it: https://matrix.org/_matrix/media/v1/download/matrix.org/bSRWdHBFqtVzowZDhwRGbzDq

Most people I've recruited into Matrix are Google Hangouts refugees looking for an open platform. On Hangouts, you cannot view the web URL of an image in this way unless you're authenticated with the server and the user has shared it with you in a chat.

Would it be possible to support moving past security through obscurity at some point? Or, failing that, at least expire the images after a week or so?

This is concerning because it would be rather trivial for someone to write a simple app querying random alphanumeric strings to harvest images people have shared in private conversations.

Autre31415 commented 7 years ago

+1

mphara8437 commented 7 years ago

This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.

But lets work through your concern.

Imagine we write your trivial app and start it running...

Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).

Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.

Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60*60*24*365*10)

52^24/10^12/10^8/(60*60*24*365*10)

At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...

Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.

Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...

This is what an image looks like when it is sent to a group using encrypted chat:

https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo

If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.

kethinov commented 7 years ago

It's still less secure than Hangouts et al though because it only requires correctly guessing one key rather than two or more.

To access a privately shared image via Hangouts, you'd have to gain access to a whole account that has been granted permission to view the image, so you'd have to know both the username and the password, which is much harder to randomly guess.

Moreover, some accounts are configured with 2FA, further increasing the security.

This implementation is far from that, and I think addressing this would be worth doing at some point.

mphara8437 commented 7 years ago

My understanding of your concern was that the media-id's which were being generated by Synapse, left users of Synapse open to a brute-force keyspace attack using a simple app (an understandable concern).

The Matrix specification does not provide details on media-id keyspace, so the keyspace for the media-id can be easily increased to increase security without issue, if required.

However a keyspace attack against the Synapse content repository API implementation is already infeasible, so no change is necessary.

Synapse is the reference implementation for the Matrix specification and adding user authentication to the content repository API would require a change to the Matrix Specification.

To propose changes to the Matrix Specifications see the following:

https://github.com/matrix-org/matrix-doc/blob/master/CONTRIBUTING.rst

PS If you are concerned about privacy, use encryption.

kethinov commented 7 years ago

Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption.

As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image.

I do think it would be prudent add those additional layers of security here.

taurhine commented 7 years ago

I totally agree with kethinov. I can imagine deploying fail2ban on the server to monitor 404 errors would slow down the attacker but still does not solve the main issue.

uhoreg commented 6 years ago

dup of https://github.com/matrix-org/synapse/issues/1403

richvdh commented 6 years ago

See also https://github.com/matrix-org/matrix-doc/issues/701 for the spec issue here.

benqrn commented 6 years ago

It is highly unlikely someone could guess the media url, the key in each media link is reasonably long enough to prevent guessing. The more likely attack vector would be obtaining the URL directly somehow; perhaps it is accidentally posted into a channel or someone who already has the link shares it without permission, your browser has a toolbar that is scraping your URL entries without your knowledge, some other person in the channel has malware on their machine that is sending away data it is collecting from a channel they are participating in with you, etc.

turt2live commented 6 years ago

Crossposting for the purposes of visibility (source):

I don't think this has been answered somewhere, so asking here in hopes people have ideas: How would federated media work?

In theory the server could start signing requests to download media, although that doesn't really guarantee that the person making the request is allowed to do so (ie: is in the room). With the upcoming introduction of users being linked to key-like objects, we could possibly use those to sign the requests, however there's nothing to stop a server lying about which user is requesting the media.

Then there's the question of the user potentially wanting specific media being publicly accessible. The primary use case being the IRC bridge which pastebins long messages.

ara4n commented 6 years ago

So this comes up on a regular basis, especially from corporate security folks who don't like the idea that a URL leaked in HTTP logs (or proxy logs) etc could then be simply curl'd by any random user to access the content. It's not a matter of the chances of guessing the URL correctly (or the chances of being hit by lightning) but instead whether an attacker who does manage to get the URL automagically gets access to the content too.

One thing we could do is to auth access to the content itself, but this means tracking the event(s) that the content is referenced by and in turn which users have access to those events and so can view the content. This is a potentially nasty leak of metadata for e2e attachments which we don't currently have otherwise. (It's possible we might need this for quotas as per https://github.com/matrix-org/synapse/issues/3339, but hopefully not). It's also quite heavy for the media repo to have to check auth rules for a room for every piece of content that is viewed (and is a bit unfortunate if the media repo is otherwise independent of the room server).

An alternative naive solution could be to just track a random bearer token alongside each mxc:// URL for each piece of content, stored in the event and in the repo. Clients would then submit this bearer token as Authorization: Bearer <secret> whenever they query the repo, meaning that URLs can't be simply copy-pasted around the place unless the auth token is also provided. This might be enough, in practice?

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Edit: we could of course also mandate that the user has a valid access_token for the server too when they are accessing the media repo, although that doesn't lock access to any particular piece of content.

ara4n commented 6 years ago

@turt2live did you have any ideas on how this should/could work?

turt2live commented 6 years ago

Not too much beyond the verbose spiel above (which ends with "I have no idea"). In any case, we should consider having a way for users/bridges/bots to say "this is supposed to be unauthed" via the API for things like the IRC bridge.

How insane would it be to always end to end encrypt media regardless of room?

turt2live commented 6 years ago

on second thought, encrypting everything doesn't really help. The authorization token probably makes the most sense, although I'm curious as to how the HMAC stuff would work.

uhoreg commented 6 years ago

For bridges, I suspect that users will end up having to request the file using a URL from the bridge, and the bridge would have to do the auth dance. Maybe we could add an endpoint that will return a time-limited download URL that the bridge can 302 the user to, so that it won't have to proxy the whole file. But this would allow to check that the original event hasn't been redacted.

MurzNN commented 6 years ago

Maybe investigate how this done in Hangouts?

ara4n commented 6 years ago

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public. (In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

erikjohnston commented 6 years ago

It's worth noting that we probably want to support being able open media in a separate window, e.g. to view large images or PDFs etc, and I don't think you can make the browser add auth headers in those cases

ara4n commented 6 years ago

there are ways of fixing that - e.g. have the client download the content itself with the right headers and then expose it to the user as a blob URL, which can then be viewed in separate windows/tabs etc.

ara4n commented 6 years ago

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Turns out that the way we used to do it was to never send access_tokens in requests at all, but send an HMAC(method, url, access_token) and then use the access_token as a shared secret, so that a leaked URL wouldn't leak an individual user's access_token. I assume we didn't do this for Matrix because calculating that HMAC would be too onerous for trivial HTTP clients, hence passing raw access_tokens around. In practice it doesn't buy us anything in this instance, as the resulting URL could still be passed blindly around anyway; we might as well create a new random secret for each URL and use that instead.

richvdh commented 6 years ago

(cf https://github.com/matrix-org/matrix-doc/issues/1043 for "access tokens suck")

user318 commented 6 years ago

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id. When accessing media, the server could check that access token is correct for the user and the user is authenticated.

uhoreg commented 6 years ago

In reply to @ara4n:

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public.

The reason that I suggested having the Bridge do the auth dance, rather than forwarding the secret in the querystring was so that a file that's redacted Matrix-side would become inacessible to bridged users.

(In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

I would just say that a file can be uploaded with a token or without a token. If it's uploaded with a token, then downloads need to be authed; if it's uploaded without a token, then it's a free-for-all.

In reply to @user318

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id.

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

user318 commented 6 years ago

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

I do not actually know how it works in e2e. I thought that files are embedded there as a base64-encoded message. And not stored as media.

uhoreg commented 6 years ago

Messages have a size limit, so you can't store files within the message itself. You also don't want to send the whole file to everyone until they request it. e2e file events are basically just pointers to an encrypted blob in the media store, along with the decryption key.

ara4n commented 6 years ago

I've written a spec proposal for solving this over at https://github.com/matrix-org/matrix-doc/issues/701, review welcome on the googledoc.

dr1 commented 6 years ago

Is matrix-org/synapse#1263 going to be taken care of with this change as well? I'm only seeing concerns of GDPR erasure, which I presume mean when someone deactivates and deletes their account. Right now its fairly easy to have a tragedy if an inappropriate attachment link gos out a bridge.

cuongnv commented 5 years ago

are there anything news ? does anyone try to re-implement this API to solve problem ?

nunoperalta commented 5 years ago

Reading this thread, it appears most people mentioned brute force attacks or someone providing the URL to other people.

What I'm really concerned of is if somehow Google or other Search Engines end up indexing these images, because they are, after all, public URLs.

If someone posts the URL in public (like the OP of this thread), the image may potentially become indexed.

This Issue is an important one that needs to be resolved, especially on a project that takes Encryption and Privacy with high priority :)

ataraxus commented 5 years ago

Just relealised this issue. It's quite embarrassing to argue for matrix because of privacy, especially in the advent of GDPR and seeing this issue... If someone can share and post media links wherever they want its quite an issue.

Also the argument of probability of guessing the file url is bogous we have so many examples of unprotected amazon buckets, where the IDs get scanned by security researchers and other people.

of course one could guess also a token but bruteforce filters on 403 are more common than on 404.

menturion commented 5 years ago

When will this considerable security issue be fixed?

rzr commented 4 years ago

Is matrix-org/synapse#7009 released ?

Let me crosslink to this discussion:

https://mastodon.social/@rzr/104116637044903278

clokep commented 4 years ago

Is matrix-org/synapse#7009 released ?

No, it should be released in the next version.

babolivier commented 4 years ago

Is matrix-org/synapse#7009 released ?

Unfortunately not yet. We're working towards getting Synapse 1.13.0 out of the door as quickly as possible since it's now pretty much overdue. Note that matrix-org/synapse#7009 will not add authentication to media, which would require a spec change - MSC2461 has been open for that purpose. What matrix-org/synapse#7009 does is to prevent browsers from leaking media URLs through referrer headers.

turt2live commented 4 years ago

^ MSC701 (https://github.com/matrix-org/matrix-doc/issues/701) is another MSC on the matter with a slightly wider scope.

damluk commented 3 years ago

Four years and still open. If I want to run a private unfederated homeserver, I have to live with the fact that media files are world readable? It is not even possible to apply additional reverse proxy authentication if clients do not support it.

ShadowJonathan commented 3 years ago

"Four years and still open" because this is a hard problem to solve, it's not easily solvable in the sense that you can have complete control over media once you have allowed a user to have a permanent reference to it, you cannot guarantee that your chat is the only one referencing to that media, nor can you limit it so - realistically - for this being a generic media store.

I suggest looking at the MSCs linked in this thread, as that's where hard problems like these gain the most discussion, synapse is an implementation of matrix homeservers, not a deliberation ground for how to solve larger problems within the matrix ecosystem.

damluk commented 3 years ago

"Four years and still open" because this is a hard problem to solve, it's not easily solvable in the sense that you can have complete control over media once you have allowed a user to have a permanent reference to it, you cannot guarantee that your chat is the only one referencing to that media, nor can you limit it so - realistically - for this being a generic media store.

It is four years because MSC701 wants to fix too much at once. The hard part is federation, the rest is more or less straight forward. Synapse could have a feature analogous to require_auth_for_profile_requests, which is meant for unfederated setups.

Besides, a Synapse admin can restrict access to media files already. They simply need to delete the file via the admin API or manually from the media_store_path. Clients have to deal with a 404 then. From what I have seen, they don't break because of it.

vince2010091 commented 3 years ago

Hi, Is there any hope that this will be fixed some day? Meanwhile: url_preview_enabled: false Regards

davralin commented 3 years ago

This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.

But lets work through your concern.

Imagine we write your trivial app and start it running...

Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).

Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.

Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60602436510)

52^24/10^12/10^8/(60602436510)

At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...

Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.

Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...

This is what an image looks like when it is sent to a group using encrypted chat:

https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo

If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.

Isn't this writeup (from 2017) enough of a conclusion that if this is a concern to you, you should use encrypted chats?

And if not, then this is a fundamental difference in how things work - which means it will require serious effort and time to fix properly, with little incentive to do so because encrypted chats renders it all moot anyway?

Meaning, if you want this implemented - you probably need to do that yourself.

vince2010091 commented 3 years ago

This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.

But lets work through your concern.

Imagine we write your trivial app and start it running...

Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).

Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.

Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60602436510)

52^24/10^12/10^8/(60602436510)

At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...

Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.

Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...

This is what an image looks like when it is sent to a group using encrypted chat:

https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo

If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.

Hello

Except if you have access to the browser history, server logs, proxy logs, ... Then no brute force is needed and you have direct access to the data with a simple GET

Secret in GET methods is a bad practice

Avatars are concerned too (https://***/_matrix/media/r0/thumbnail/***/xxxxxxxxxxxxxx?width=400&height=400&method=crop If they can be cats pictures, they also can be personnal data like a face

For a soft that claim privacy and security, this is weird to not authenticate or require access token for such requests.

Regards,

clokep commented 3 years ago

I'm going to move this to the matrix-doc repo since this would need to be specced before synapse can implement anything.

clokep commented 3 years ago

And now that we've transferred it it seems that matrix-org/matrix-spec-proposals#3796 is the duplicate for this.

davralin commented 3 years ago

Avatars are concerned too (https://_**/_matrix/media/r0/thumbnail/**_/xxxxxxxxxxxxxx?width=400&height=400&method=crop If they can be cats pictures, they also can be personnal data like a face

For a soft that claim privacy and security, this is weird to not authenticate or require access token for such requests.

A conversation in a public space is still public, even if the conversation is between three people.

If the conversation should be secret, or the participants always wants privacy, they choose to encrypt all the communication.

That renders the mediaURL useless, as all you can get from the link is an encrypted blob - as pointed out in the linked cat-picture.

In many ways the conclusions is simple:

There's no reason to trust the servers implementation (or lack thereof) of anything if there's E2EE involved anyway...

richvdh commented 2 years ago

And now that we've transferred it it seems that matrix-org/matrix-spec-proposals#3796 is the duplicate for this.

matrix-org/matrix-spec-proposals#3796 is a proposal to fix it; this is the canonical issue.

n0toose commented 2 years ago

The assertions in this thread seem to assume that, and, please correct me if I am wrong:

This "too negligible for most people to actually communicate it properly" approach is personally making me feel uneasy, even if it were possibly more likely for me to get struck by lighting, considering that there are opportunities (in the future) to actually bring the chance of anyone ever receiving anything down to an absolute zero.

FlyveHest commented 1 year ago

A comment on this, as far as I can see, this will break media being shared across bridges, unless these bridges relay binary data directly.

But in turn, this will defeat the purpose of protecting the media since it will be directy available on another platform, maybe without the original poster being aware of this.

Iruwen commented 1 year ago

Correct me if I'm wrong, but: this makes Matrix a great filesharing host. Just create an anonymous account and an unencrypted non-public room and upload whatever you want in chunks as big as the server allows, then let the world know about the URLs to be consumed by tools like JDownloader. With some more effort on the client side, having public access to encrypted chunks is even more perfidious. And the server operator is probably liable for any illegal content (hello DCMA takedown or worse).

axelsimon commented 11 months ago

@Iruwen In most (many?) legal regimes, you are only liable for things you know to be hosting, and become liable once you've been informed of the case (and often, the material must also be "manifestly illegal" or similar). Simply having something "bad" on your server doesn't automatically make you liable.

A lot of large services (such as Youtube) will automatically take something down as soon as they are notified that it could be problematic, because that's when their legal liability starts. But most of the time they don't care to check whether it is actually problematic, especially for copyright matters & fair use/dealing (hence DMCA takedown requests being weaponised).

rltas commented 11 months ago

Until this is resolved, I added a Lua script in my nginx reverse proxy which only allows media access for ip addresses that successfully accessed the /capabilities or /sync endpoints, which seem to be two authenticated ones that are reliably accessed first.