When we redact events, any mxc content they refer to should be redacted too (SYN-216)

matrixbot commented 9 years ago

It's a bit of a disasterous thinko that we can redact events which point to stuff in the media repo, and that content is subsequently preserved even though the event is nuked. We should rm it and its caches too (assuming the HS is honouring redactions).

(Imported from https://matrix.org/jira/browse/SYN-216)

(Reported by @ara4n)

matrixbot commented 9 years ago

Jira watchers: @ara4n

matrixbot commented 9 years ago

Links exported from Jira:

relates to SYN-576

ara4n commented 7 years ago

We just had a minor disaster with this happening (the MXC URL was bridged to IRC, so redacting the content on Matrix was achieving nothing). This should be trivial to fix...

erikjohnston commented 7 years ago

We should make sure the content isn't completed deleted, for moderation purposes.

On Fri, 6 Jan 2017, 19:08 Matthew Hodgson, notifications@github.com wrote:

We just had a minor disaster with this happening (the MXC URL was bridged to IRC, so redacting the content on Matrix was achieving nothing). This should be trivial to fix...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/matrix-org/synapse/issues/1263#issuecomment-270975273, or mute the thread https://github.com/notifications/unsubscribe-auth/AICaWKhfOgyL8L1dg3CnQXGgt_Jiuv1Lks5rPozFgaJpZM4Lc_jV .

ara4n commented 7 years ago

I wonder whether a good enough compromise would be for HSes purge redacted data after a few days (Windows Recycle Bin stylee), albeit with the option of configuring the retention per HS. The idea that sensitive data can be left visible to HS admins (and clogging up diskspace) indefinitely, after being redacted, feels undesirable and unintuitive.

jfrederickson commented 7 years ago

This came up in #matrix:matrix.org earlier today - as an HS admin, I would really really like to be able to configure my HS to purge redacted content. At the very least, I don't want my HS to continue to serve requests for it from the media repo.

Specifically in reference to illicit content, continuing to serve it from my HS could put me in a really tough spot, legally. And if it's redacted and therefore not easy to find in the first place...

uhoreg commented 7 years ago

Of course, you have to be careful that the mxc content isn't referred to by a different event (possibly including an encrypted event).

rkfg commented 6 years ago

This is absolutely needed to keep the homeserver storage relatively small. I set it up on a VPS and it's growing constantly. It'll become a problem in several months. At the same time we should preserve some content and maybe events from deletion like avatars of users and rooms. It would not be nice to suddenly lose them after a maintenance cycle.

Mikaela commented 5 years ago

Is https://github.com/matrix-org/synapse/issues/2369 a duplicate of this one?

anoadragon453 commented 5 years ago

One tricky point is that we can't just have the server delete the media on event redaction as in encrypted rooms it does not know what the attached mxc:// url is.

ara4n commented 5 years ago

the redacting client can do it though.

richvdh commented 5 years ago

the redacting client can do it though.

Only if we propose a way to delete media across federation (https://github.com/matrix-org/matrix-doc/issues/790)

dkasak commented 3 years ago

I think it would make sense to solve this at least for the single HS case, by allowing a redacting client to delete the media it uploaded to its HS. Then later we could build on top of that to add a way for this to work over federation.

benjaoming commented 2 years ago

I :+1: @ara4n's suggestion https://github.com/matrix-org/synapse/issues/1263#issuecomment-270975273 - having a server configurable retention time makes sense and is a sweet spot between many needs:

Users: Privacy, some minimal right to be forgotten or at least have some control
Moderators: Relevant moderation tools against abuse
Sysadmins: Cleaning up and maintaining servers -- without having to manually free up space for what's most likely junk

journalctl for systemd has a nice option --vacuum-time which might be of inspiration. Perhaps the same idea of time-based vacuuming can be applied to redacted media. For people who want redacted event media immediately purged, the redaction time can be "0 seconds" and for people who want more moderator control, it can be configured higher.

This would seem related to https://github.com/matrix-org/synapse/issues/3479#issuecomment-716694290 - The API for deleting on-demand deletion specifies:

POST /_synapse/admin/v1/media/<server_name>/delete?before_ts=<before_ts>

So imagining that a delete?vacuum_redacted_expiry=10s could be a possible example of such an API and similarly for purging remote caches. Of course, the server should clean up redacted media by itself, not only through API calls.

moeenio commented 2 years ago

What's the state of this? It feels bad to find out that you have basically no way of deleting media you uploaded to a homeserver that's not your own.

alexshpilkin commented 2 years ago

@locness3 Note that a malicious homeserver is free to retain whatever it wants anyway and you won’t be able to tell (through technical means, barring pervasive DRM-equivalent mechanisms like Intel SGX), so this issue is purely about cooperating servers. While having some way to delete things would be good for giving users some peace of mind, relying on it for things you wanted to keep truly private probably won’t be a good idea even once it exists.

Mikaela commented 2 years ago

Do I understand correctly that there is no reason for well-behaved homeservers to attempt removing child sexual abuse material from their media repositories, because it's always possible that a malicious homeserver doesn't wish to do that and thus the CSAM would still be in the Matrix federation forever?

benjaoming commented 2 years ago

@Mikaela that's a good sharp question. I would agree that the target for fixing this issue should be well-behaved servers. I think that what @alexshpilkin points out is more of a communication issue around that function, such that an individual who is redacting contents knows what is at stake (this could be a well-meaning person who incidentally uploaded a copy of their passport).

alexshpilkin commented 2 years ago

@Mikaela The comment above is correct: I was solely objecting to the implication that this function guarantees more than it actually does, that is that I wanted to say there will always be

no way of deleting media you uploaded to a homeserver that's not your own

and to some extent it’s even a feature (there’s no way of deleting information you uploaded to a brain that’s not your own, either). AFAIU the Matrix designers used “redact” instead of “delete” in the first place in an attempt to avoid users assuming that the operation provides stronger confidentiality that it actually does (too bad they have seem to abandoned that terminology in Element).

That doesn’t mean that a streamlined way to ask a cooperating homeserver to delete things would not be useful—I fully acknowledge that many things are best-effort and still useful, such as delivery of data over the Internet :)

FSG-Cat commented 2 years ago

AFAIU the Matrix designers used “redact” instead of “delete” in the first place in an attempt to avoid users assuming that the operation provides stronger confidentiality that it actually does (too bad they have seem to abandoned that terminology in Element).

No it was probably not choosen because of the guarantees it was probably choosen because its EXTACTLY what happens. Ever see a spy movie where a document is partially blacked out well its because its a redacted copy this is exactly what we do in matrix. We nuke part of the event we dont delete it we partially redact it.

As for the point that @Mikaela brings up. As far as i am concerned malicious servers are to be completely ignored from this debate. Why? Because redactions face this EXACT same issue already and therefore since we have redactions for events we can have redactions for media same issues faced and already concluded to be acceptable.

moeenio commented 2 years ago

and to some extent it’s even a feature (there’s no way of deleting information you uploaded to a brain that’s not your own, either).

I find this resonable when it comes to communicating between homeservers, however if you're signed up on a homeserver that you do not own (matrix.org, as a completely random example :/ ), not being able to delete media you uploaded to it can make you feel out of control.

shukon commented 2 years ago

Of course, you have to be careful that the mxc content isn't referred to by a different event (possibly including an encrypted event).

One tricky point is that we can't just have the server delete the media on event redaction as in encrypted rooms it does not know what the attached mxc:// url is.

Wouldn't it be possible to copy the file in the media-repo and generate a new mxc-url when forwarding to an encrypted room and using a hard-link style counter for forwarding to an unencrypted room (since the hs can keep track of unencrypted mentions of the mxc-url without privacy issues)? For mxc-url-redactions in encrypted rooms the client has to inform the server (and not doing so results in files without any pointers to it). And deduplication-wise I think the incredible gain of space by actually being able to delete unused content would easily make up for the duplicated storage in some few edge-cases where a file is forwarded to an encrypted room.

asakura42 commented 2 years ago

Why this still doesn't implemented yet? What if I as user want to delete attachment? Simple removing of the message won't do it? That's very strange.

bkil commented 2 years ago

You can always inline smaller attachments within the body of the message as base64, in the URI anchor or on upload them to external servers, so you do have a workaround.

Also, I guess you should only attach files with sensitive content to E2EE rooms. Such files with be encrypted separately, and after the message is removed, its key will be lost, making the file contents unavailable.

asakura42 commented 2 years ago

@bkil

Also, I guess you should only attach files with sensitive content to E2EE rooms.

Yes, but that limits me in actions. I do not know all the features of the Matrix network, but I am sure that it is worth avoiding the mistakes of Telegram and other messengers, which store media forever, even after deleting the message. The point is that this creates a deceptive confidence of the users that the media they have uploaded will not remain on the server. On the contrary, the media remains literally dead weight on the server.

matrix-org / synapse

When we redact events, any mxc content they refer to should be redacted too (SYN-216) #1263