linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.74k stars 275 forks source link

[vcr-2.0] Exploit metadata caching to speed up backups #2801

Closed snalli closed 2 months ago

snalli commented 2 months ago

The ReplicationEngine operates by first identifying keys that are missing in Azure, which incurs one metadata request per blob. For existing blobs, the engine then retrieves their metadata, adding another request per blob. Additionally, the engine performs checks for expiry and deletion, each costing two requests per blob. These requests can be minimized to one by using caching.

This patch introduces a thread-local cache that selectively stores metadata. The findMissingKeys function caches metadata for existing blobs, which is subsequently used by the findKeys and applyXXX methods.