mastodon / mastodon

Your self-hosted, globally interconnected microblogging community
https://joinmastodon.org
GNU Affero General Public License v3.0
46.35k stars 6.8k forks source link

Ability to re-use uploaded media files in a new toot + prevent duplicates #2317

Open Exagone313 opened 7 years ago

Exagone313 commented 7 years ago

Hi,

When using memes, as videos or images, you would like to use them multiple times, like emojis. Actually you re-upload it (or link it but you loose preview), and server storage grows. I don't think it checks if the file already exists (but tell me if I'm wrong). My suggestion is to have access (per-user) to former uploads as an alternative to upload. Also, it could check for duplicates in all uploaded media files (sha386/sha512 checksum + media type + dimensions should be enough to do not get a collision), across users, to optimize storage space (harder to implement than the first suggestion).


Gargron commented 7 years ago

+💯 on the deduplication feature request. I would love to have that. Sadly, it's not very easy.

sull commented 7 years ago

I was just thinking of this issue myself as I have been testing media attachments API with same file several times. I use AWS on my instance but regardless, I like to be mindful of wasteful resources like bandwidth and storage.

This is a tough one to do elegantly. Ideally, if the file is exactly the unaltered file previously uploaded, the file hash can be compared before upload and present user with matched file to confirm if it should be re-used (or just skip confirm and use it).

If it's the same file but the hash is different because of some modification of file (not visually noticeable), then the only way is to let user's memory of what they uploaded take precedence and let them sift through their upload history if they are the type to care (it would be faster to just upload file again in most cases unless file search works fast and accurately).

I think ultimately, this is a feature (file management) that might be best as a 3rd party plugin but maybe the lightweight hash check is worth implementing (store all file hashes in db and do client-side comparison pre-upload).

Exagone313 commented 7 years ago

This may be a minor issue but one could then check if an image has been uploaded by another user (by sending an hash from client-side js, the server response tells if you have to upload or not). But even when letting the user upload to try to prevent this, a timing attack may be used. As Mastodon is not really suitable for privacy, may not be an issue at all.

TwistedLucidity commented 7 years ago

Tools like "fdupes" could be run periodically to resolve duplicates (creating links would seem to be the obvious way, see this StackExhange). Perhaps suggest a crontab for admins to use? Does not solve the network transfer issue though.

Or, could the server recognise a media URL when pasted in and respond with a patch for the preview? Still leaves the issue of the user having to sort through their media to find the URL.

ldidry commented 6 years ago

May I up this issue ?

tlhall commented 6 years ago

When I read Exagone313's issue, I thought he was talking about something simple - that is, when you copy and paste a link to a media asset on the server from an existing status to the new status, have the new status treat is as if it had been uploaded, rather than treating it as just any old link. Maybe prefix @ to the ink ? Global dedup would be good, but maybe later ? Iddry - I think you just click the smiley face in the upper right-hand corner of the lead post then click the thumbs up icon.

ldidry commented 6 years ago

@tlhall I know how to do that, I posted to show that this issue is, IMHO, important. You're right about something simple, it would be a really simple thing. But prefixing with @ way be confusing since it's what is identifying accounts. Maybe a % ?

asterismo commented 1 year ago

what is the status of this feature request? did this ever got implemented?

trwnh commented 1 year ago

@asterismo nope, still open

prior art: misskey drive lets you do this