matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
171 stars 91 forks source link

Behaviour of `allow_remote` on media endpoints is unclear #1767

Open deepbluev7 opened 3 months ago

deepbluev7 commented 3 months ago

Link to problem area:

https://spec.matrix.org/v1.10/client-server-api/#get_matrixmediav3downloadservernamemediaid

Issue

The spec says for allow_remote:

Indicates to the server that it should not attempt to fetch the media if it is deemed remote. This is to prevent routing loops where the server contacts itself. Defaults to true if not provided.

This is interpreted by some people, that cached remote media should be returned. (That however would leak data about what media has been viewed by users on that server.)

The matrix-media-repo project implements the behaviour like that.

However Synapse doesn't return ANY remote media, if allow_remote is false: https://github.com/matrix-org/synapse/blob/be65a8ec0195955c15fdb179c9158b187638e39a/synapse/rest/media/download_resource.py#L83

Conduit seems to follow what the matrix-media-repo does: https://gitlab.com/famedly/conduit/-/blob/b11855e7a1fc00074a13f9d1b9ab04462931332f/src/api/client_server/media.rs#L113

Dendrite seems to follow Synapse's behaviour: https://github.com/matrix-org/dendrite/blob/b9abbf7b20b4faaffe754c4a1ea4d5f0e7bd72b9/mediaapi/routing/routing.go#L146

While in general this difference doesn't cause problems, since servers usually only ask the original server about the media (by checking the server name in the media id), if a client wants to check if a media is already stored on the server, it would not be able to do so. However, being able to verify the existence of remote media on a server does expose some data about what users on that server have looked at. Synapse's behaviour is better from a data minimalism perspective in that case.

richvdh commented 3 months ago

Agreed, this could be clarified. And for the reasons you suggest, this seems like a bug in Conduit and m-m-r.