thumbor-community / aws

Thumbor AWS extensions
MIT License
155 stars 70 forks source link

[RFC] Add some randomness in the key names #88

Closed Bladrak closed 6 years ago

Bladrak commented 7 years ago

Following https://www.youtube.com/watch?v=_FHRzq7eHQc we should add some randomness in our S3 Key names, in order to allow for better scalability and performance in high load environments.

Here's what I suggest: edit the https://github.com/thumbor-community/aws/blob/master/tc_aws/aws/bucket.py#L125 function by appending the first seven chars of the sha1 of the path and its extension in the end.

For instance, a key such as image/in/my/folder/foo.jpg would become image/in/my/folder/foo.jpg/baaa1c1.jpg.

What do you think?

aoqfonseca commented 7 years ago

I'm just concerned about performance. All time that you made a request for an image, the system will take a sha1 from URL and search for this equivalent image on the bucket.

Bladrak commented 7 years ago

It will indeed take a bit more time within thumbor, but should allow for faster response time from AWS. In the overall, I think we should gain some performance, given the most time is spent within AWS fetching.

aoqfonseca commented 7 years ago

good point

dhardy92 commented 7 years ago

problem is that randomness must be set at the prefix of the key. on suffix is it useless.

This make finding groups of images (like image from a defined site using http loader or so with define Id like type/id for example) difficult to find in our bucket.

I even wonder if using a prefix with TC_AWS_STORAGE_ROOT_PATH is not counterproductive.

juris commented 6 years ago

There used to be randomness in the key names long time ago: https://github.com/thumbor-community/aws/commit/af53c75630144ba0c6f621d97afd4dede5f1bd8c#diff-73180fe182db7c4424a4ec422d370b5b

def normalize_path(self, path):
    root_path = self.context.config.get('RESULT_STORAGE_AWS_STORAGE_ROOT_PATH', default='thumbor/result_storage/')
    path_segments = [path]
    if self.is_auto_webp:
        path_segments.append("webp")
    digest = hashlib.sha1(".".join(path_segments).encode('utf-8')).hexdigest()
    return os.path.join(root_path, digest)      
Bladrak commented 6 years ago

Indeed, missed that at the time, good catch!

Bladrak commented 6 years ago

If it's fine with everyone, I'd like to proceed with this feature, but by making it configurable, to avoid an impact for the people upgrading the solution (so they won't have to re-build the cache).

juris commented 6 years ago

Would be great! We are still using Thumbor with the old aws plugin and not upgrading to the current version because of the lack of randomness.

Bladrak commented 6 years ago

I've submitted a pre-PR, feel free to check whether that's a behavior that would seem suitable.

Bladrak commented 6 years ago

This has been released under the 6.1.0 version on Pypi