matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
171 stars 91 forks source link

Index content repository content consistently by hash (SPEC-140) #34

Open matrixbot opened 9 years ago

matrixbot commented 9 years ago

In pre-Matrix stuff we always indexed and referred to binary content by its hashcode. This magically factored out duplicate content in server-side content repos. If we did this in Matrix we'd also avoid filling up diskspace with hundreds of identical cat GIFs; it's unclear why we don't?

(Imported from https://matrix.org/jira/browse/SPEC-140)

(Reported by @ara4n)

matrixbot commented 9 years ago

Jira watchers: @ara4n

matrixbot commented 9 years ago

Links exported from Jira:

relates to SYN-401

turt2live commented 6 years ago

This is also possible as an implementation detail. For instance, matrix-media-repo (a project of mine) achieves this by storing a 1:many relationship of files to mxc uris. This allows mxc://t2bot.io/whatever1 and mxc://t2bot.io/whatever2 to point to the same file on disk (which is indexed by hash). It also goes the extra mile and will try to re-use mxc uris on upload.

richvdh commented 6 years ago

agreed, I'd favour keeping it as an impl detail

richvdh commented 6 years ago

didn't mean to close it yet!