jvsteiner / s3-image-uploader

MIT License
31 stars 15 forks source link

Feature requests: Deletion, media resizing, S3 directory creation based on current working directory of upload #30

Open Elijer opened 3 months ago

Elijer commented 3 months ago

Hey! First off, I love this plugin and if anything it works too well. It's sort of changed my life - the ability to express not only what I'm working on but easily upload images to my blog that fluently describes it in the IRL terms of photos and videos has allowed me to blog approaching much closer to the speed of thought and experience than I thought possible. I've been uploading media like crazy. So far still paying 5 cents a month for my S3, but I can see this going up a bunch.

The main concerns I see are around 1) The deletion of unused media - if I upload an image and then realize it was wrong, or needs to be resized or updated etc. then I will delete that link, and it will just be hanging in my S3, unreferenced and unused. I don't see anything built in that automatically deletes media from the bucket that are not being used - which doesn't surprise me, as automatically deleting things probably comes with at least a few decisions to make. I mean for one, what if an image is referenced twice in a vault? Then a listener would need to be created that checks to see if a reference to media was removed, and if that reference exists anywhere else, and only delete the file from S3 if both of those conditions are met

2) Media resizing - this is related, as this is probably why I've deleted and replaced media most often so far. It would be nice to just have a catchall media upload configuration that would resize images that are too large, or even just prevent them from being uploaded.

3) S3 directory creation / metadata - this is a floating idea, but I wonder if some metadata could be added to S3 files that shows the path of the file that originally uploaded them.

This is a lot, but I am interested in extending the functionality of this plugin in these ways in the future and wondered if anyone had thought about these features.

jvsteiner commented 3 months ago

Hi, glad you are getting good use from the plugin.

You have brought up a number of points here, so let's discuss how you think it should work, and then I'll see if it's something I think I can support.

  1. Deletion: You are right that the plugin just uploads - it doesn't take "ownership" of the content once it is uploaded, and has no ability to delete it. A poor man's solution could be to run a bucket policy that deletes automatically after some time, but I'm guessing this might not be desirable. In order to support this, you are right, it seems like I'd have to build an index of uploaded content, and then manage it via reference counting. Not everyone would want that behavior - I don't for one. For me, it would seem too easy to delete things accidentally. If the link getting deleted, meant the content was deleted, it could result in accidental deletions. This is a bigger risk than excess storage space usage. Often, I paste screen shots, and once they are gone, they are gone. there's not any copy on my machine - it was only briefly in the clipboard. What would be your strategy to mitigate this?
  2. Resizing: currently we don't touch the media - but the filenames are the md5 hash value of the content. This means that when the same content is uploaded multiple times, it doesn't sit on the server (s3 bucket) twice. If the media will be processed, I would need a comprehensive set of settings to deal with it, what file types should be resized, what the maximum file size is, what the target quality should be, etc. It's taking on a lot of new scope that was not originally intended. You know the original idea here is that s3 space is cheap, and I would like to offload that from the machine. Can you tell me more about your use case? Why do you resize?
  3. not sure about this - seems like the source location could change.
Elijer commented 3 months ago

Thanks for engaging with these ideas so quickly!

  1. Deletion: I think that long story short, this is quite complicated to implement, and for my use case, the simplest solution is to handle this type of S3 “pruning” inside of my deploy script, rather than trying to get it into the plugin. Your examples and concerns about improper deletion over storage overuse make a lot of sense, and for the masses I think your priorities are absolutely correct. However, I do think that an “archive” folder on S3 could mitigate some of these concerns, and a virtual directory structure could allow the plugin to potentially handle archiving.
  2. I've found that default image size of many of my pictures are much larger than I need them to be. A little bit of compression allows them to load a lot faster, with the bonus of adding less space, at very low cost to quality. I have been manually compressing some of my images from 2MB to about 150KB. Sometimes I will forget to do this to begin with, leading to duplicate images that aren’t used, which is why I decided to file this issue - not entirely necessity, but more a feeling that maybe this would creep up and get me eventually. But maybe I’ll just get better at remembering!
  3. You’re definitely right that source location of images could not only change, but there could be multiple locations of images, making a directory structure sort of impossible. I think some sort of virtual directory structure, with references to URL in JSON or something, would be the best way to do this. This would allow the same file to be referenced by multiple document objects.

So that said, this is what I’m thinking in terms of action:

  1. I’m intereseted in scrappily coming up with my own tool to scan my S3 and compare it to my repo, see what unused media there is, and possible move it to an archival S3 directory that is easier to handle. If that goes well, maybe I could fork your plugin and see if I can incorporate it into the file upload process itself, but my hunch is it might be better for this to be a separate tool anyway, since it could be helpful outside of obsidian.
  2. I do think that some automatic compression settings could be helpful for the plugin, but if I’m the only one interested it might not really be necessary. I can just batch compress things before dropping them into obsidian. Using the tool above, if I do this sloppily I can eventually audit it and clean up. That said, what I envision is some optional settings in the plugin settings menu that just allow the user to turn on compression on a per-file basis - JPGs, PNGs, etc. And then maybe specify a target file size - the image compression libraries I’ve used in javascript allow you to target a certain file size for compression.
jvsteiner commented 3 months ago

For compression, how would it work? Should every image be compressed, or just those over a certain size? How much should they be compressed? Which file types should be compressed? Should we convert file types, ie. PNG to JPG? Should there be a way to disable/enable compression at the note level (I have been pretty careful to enable frontmatter versions of all the settings, since people use their vaults for all kinds of different things, and one size doesn't necessarily fit all.

Elijer commented 2 months ago

What do you mean by "the note level"? I would really like to appreciate you being careful to enable frontmatter versions of all types but might need your help understanding what this means, as I don't know the full range of how Obsidian is used - I'd imagine it's quite broad!

These are great questions. For "how would it work", I think I was thinking a per-file-setting that could be turned off by default where the user specifies a file size in, to start with, just PNG and JPG, and if an uploaded file is above that size, it is compressed before upload. I was thinking something like this:

JPEG Compression: On
Compress after JPEGs are above this file size: 2MB

PNG Compression: On
Compress after PNGs are above this file size: 2MB
jvsteiner commented 2 months ago

by "the note level" I mean a setting which is set specifically for one particular note, where it is different than the settings that apply generally. For example, I have some notes where I upload the media to a different bucket, because they are for work. The point is that for every setting, there is often a use case where it needs to be set different at the note level, and then the settings need to supported by setting it in the frontmatter (which exists in every note individually) rather than in the global settings (which would be used for all notes by default.)