matrix-org / synapse-s3-storage-provider

Synapse storage provider to fetch and store media in Amazon S3
Apache License 2.0
118 stars 33 forks source link

URL cache #21

Closed rkfg closed 5 years ago

rkfg commented 5 years ago

Not sure where to post this but I guess the URL cache and thumbnails should not go to S3. They live only about an hour and cleaned up after 3 days so it's more rational to store them locally only. Also, cleanup doesn't affect S3 so those images are stored effectively forever. I don't see an option to exclude those file from S3. I'd only like to put local media to S3 and everything else to be stored on HDD so I could clean it up from time to time.

I think it's an issue because after I analyzed my storage the breakdown was like this (including thumbnails):

And that's after just 3 days.

erikjohnston commented 5 years ago

Hmm, I thought synapse didn't upload URL cache data. Can you open a synapse issue for this? Either this should be a config option, or we should just not support it at all

rkfg commented 5 years ago

Done, see matrix-org/synapse#5411

rkfg commented 5 years ago

In the meantime I made a makeshift patch that just refuses to upload anything except local content:


diff --git a/s3_storage_provider.py b/s3_storage_provider.py
index e0ab67e..1ef4133 100644
--- a/s3_storage_provider.py
+++ b/s3_storage_provider.py
@@ -69,6 +69,8 @@ class S3StorageProviderBackend(StorageProvider):
         """See StorageProvider.store_file"""

         def _store_file():
+            if not path.startswith("local_content"):
+                return
             session = boto3.session.Session()
             session.resource('s3', **self.api_kwargs).Bucket(self.bucket).upload_file(
                 Filename=os.path.join(self.cache_directory, path),