noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
269 stars 78 forks source link

NSFS | S3 | Versioning: VersionId corruption after version-enabled and version-suspended mode operations #8443

Open hseipp opened 2 days ago

hseipp commented 2 days ago

Environment info

Actual behavior

Ceph s3_tests test case test_versioning_obj_suspend_version() fails with a version id mismatch AssertionError while removing the object versions that previously got created:

...
        for idx in range(num_versions):
>           remove_obj_version(client, bucket_name, key, version_ids, contents, idx)
...
client = <botocore.client.S3 object at 0x7fbc6c068220>, bucket_name = 's3tests-2252vlq3tsfbhowoopaa9-1', key = 'testobj'
version_ids = ['mtime-d4qihcxvod8g-ino-1dzk', 'mtime-d4qihcxz73eo-ino-1dza', 'mtime-d4qihcy25on4-ino-1dzb', 'mtime-d4qihcy53shs-ino-1dzc', 'mtime-d4qihczsdyww-ino-1dzo', 'mtime-d4qihczur37k-ino-1dzp', ...], contents = ['content-1', 'content-2', 'content-3', 'content-4', 'content-0', 'content-1', ...]
[noobaa-20241008_2.log.gz](https://github.com/user-attachments/files/17295238/noobaa-20241008_2.log.gz)

    def check_obj_versions(client, bucket_name, key, version_ids, contents):
        # check to see if objects is pointing at correct version

        response = client.list_object_versions(Bucket=bucket_name)
        versions = []
        versions = response['Versions']
        # obj versions in versions come out created last to first not first to last like version_ids & contents
        versions.reverse()
        i = 0

        for version in versions:
>           assert version['VersionId'] == version_ids[i]
E           AssertionError: assert 'mtime-d4qihcyuioe8-ino-1dzl' == 'mtime-d4qihczsdyww-ino-1dzo'
E             
E             - mtime-d4qihczsdyww-ino-1dzo
E             ?             --- ^^        ^
E             + mtime-d4qihcyuioe8-ino-1dzl
E             ?              ^^^^^        ^

s3tests_boto3/functional/test_s3.py:8307: AssertionError

Expected behavior

Test test should pass.

Steps to reproduce

Execute Ceph s3-test function test_versioning_obj_suspend_version()

More information - Screenshots / Logs / Other output

noobaa log with "all" log level attached. noobaa-20241008_2.log.gz

shirady commented 1 day ago

Hi @hseipp, I looked at the test and I saw there is a mode change of the bucket versioning (Enabled -> Suspended -> Enabled). Did you try to run the test by adding a delay of 1 at least minute, for example, time.sleep(65) after each configuration change (as we commented in another issue - here)?

hseipp commented 1 day ago

When I add this 65-second delay after each versioning mode change like

$ git diff
diff --git a/s3tests_boto3/functional/test_s3.py b/s3tests_boto3/functional/test_s3.py
index 985de2a..ba8306c 100644
--- a/s3tests_boto3/functional/test_s3.py
+++ b/s3tests_boto3/functional/test_s3.py
@@ -8570,12 +8570,13 @@ def test_versioning_obj_suspended_copy():
     client = get_client()

     check_configure_versioning_retry(bucket_name, "Enabled", "Enabled")
-
+    time.sleep(65)
     key1 = 'testobj1'
     num_versions = 1
     (version_ids, contents) = create_multiple_versions(client, bucket_name, key1, num_versions)

     check_configure_versioning_retry(bucket_name, "Suspended", "Suspended")
+    time.sleep(65)

     content = 'null content'
     overwrite_suspended_versioning_obj(client, bucket_name, key1, version_ids, contents, content)

the test passes in 2 out of 2 attempts while it failed in all attempts before applying the change.

The other issue you are referring to was in relation to bucket policies, here we are performing versioning mode changes, but if I understand it correctly we are caching bucket data in general, so the same root cause might apply.

shirady commented 1 day ago

@hseipp you're right I referred to an issue about bucket policy, but the general idea was about the cached bucket configuration (in the previous example it was bucket policy, and in this issue, it is the bucket versioning).

@nadavMiz @romayalon As you can see, with a delay before each configuration change the test is passing on GPFS. I think we can close this issue and refer to issue #8391, WDYT?