Open igalshilman opened 1 week ago
I think we might have a better path to implement this, using Object Versioning. A quick search shows that this is supported across MinIO / Azure / GCP, here's the S3 documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html.
Here's a sketch of the conditional PUT path:
async fn put( &self, key: ByteString, value: VersionedValue, precondition: Precondition, ) -> Result<(), WriteError> { let key = object_store::path::Path::from(key.to_string()); match precondition { Precondition::MatchesVersion(version) => { // If we have a cached version, we can also add an if-modified-since condition to the GET. // Unfortunately we can't set the version explicitly, and a HEAD request doesn't return enough // information to determine our own metadata value version. let get_result = self.object_store.get(&key).await.map_err(|e| { WriteError::Internal(format!("Failed to check precondition: {}", e)) })?; if extract_object_version(&get_result.payload) != version { return Err(WriteError::FailedPrecondition( "Version mismatch".to_string(), )); } self.object_store .put_opts( &key, PutPayload::from_bytes(serialize_versioned_value(value)), PutOptions::from(PutMode::Update(UpdateVersion::from(&get_result))), ) .await .map_err(|e| WriteError::Internal(format!("Failed to update value: {}", e)))?; Ok(()) } _ => todo!(), } }
Unfortunately we don't control the versions; in S3 they are a monotonic numbered sequence but we don't get to set them directly - rather, S3 will set them for us. Our own API relies on explicit versions so I'm assuming we'll just serialize them along with the rest of the payload as the object body.
A normal GET request implicitly returns the latest version, no further tricks required. It's also possible to request a particular previous version, but that's not needed for our API. Might be useful for troubleshooting though! (And I'd seriously consider serializing the values as JSON for operations friendliness.)
The only requirement for this all to work is that we must use a bucket with object versioning enabled, which is not the default but is trivial to set. S3 and other stores also support automatic cleanup of old versions using object lifecycle policies, so we don't need to implement that ourselves.
Wouldn't the PutMode::Update
require an If-Match
header support from S3? I thought that it currently only supports If-None-Match
natively and for PutMode::Update
one would have to use DynamoDB? Or is this different when using versioned buckets?
@tillrohrmann you are right, I was wrong; this should work on Minio and Cloudflare R2, but not with S3. I got excited when I saw how elegant the object versioning API looked but it requires If-Match
support by the underlying object store, which is a no-go.
@tillrohrmann you are right, I was wrong; this should work on Minio and Cloudflare R2, but not with S3. I got excited when I saw how elegant the object versioning API looked but it requires
If-Match
support by the underlying object store, which is a no-go.
https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ It seems support has been added
https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ It seems support has been added
Nice, this was quick 😄
Finally! 🎉😆
Object store backed Metadata store
This PR introduces a new type of a metastore that uses
S3
(any similar) object stores that support conditional updates.end result
running the following:
starts a 3 node cluster that uses an s3 bucket for a metadata storage:
listing the bucket yields the possible keys
Additional configuration keys
How does this work?
This is basically a client, each node runs an instance of this client, and the clients are using S3's optimistic concurrency control to move forward the different keys.
currently the only support cred type are AWS_X env variables, a followup referenced here in the conversation would be to plug in additional cred providers (unify the work done by @pcholakov )