Closed mat-gas closed 6 months ago
OK, it might be related to the python API not raising any error/exception actually
compose_object should handle it transparently? (in here https://github.com/minio/minio-py/blob/e10196f5b6dd5910722b52d184986cb2de6a89cd/minio/api.py#L1362 )
client.copy_object("temp", "test-upload2", CopySource("temp", "test-upload"))
or should a hard exception be raised instead of copying data and corrupting the copy?
https://github.com/minio/minio-py/blob/e10196f5b6dd5910722b52d184986cb2de6a89cd/minio/api.py#L1268
or it it a bug in compose_object
where start_bytes is always == offset (here 0) at line 1620 whereas remaining size is still updated at line 1621 ?
In a multipart what are the final parts matter since server doesn't know the whole of the object.
Can you share a proper reproducer?
@mat-gas Enable Minio.trace_on()
and share the output.
dd if=/dev/zero of=/tmp/zero bs=1G count=6
echo "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" > /tmp/test-upload3
cat /tmp/zero >> /tmp/test-upload3
md5sum /tmp/test-upload3
e3d297afb6078748ce597e88bcab5316 /tmp/test-upload3
ls -al /tmp/test-upload
-rw-rw-r-- 1 mat mat 6442450982 avril 30 11:52 /tmp/test-upload3
xl.meta
on disk{
"Versions": [
{
"Header": {
"Flags": 2,
"ModTime": "2024-04-30T15:27:46.484578361Z",
"Signature": "1b3500c5",
"Type": 1,
"VersionID": "00000000000000000000000000000000"
},
"Idx": 0,
"Metadata": {
"Type": 1,
"V2Obj": {
"CSumAlgo": 1,
"DDir": "/8hFEFpcQleRWuS3ACrRvg==",
"EcAlgo": 1,
"EcBSize": 1048576,
"EcDist": [
10,
11,
12,
13,
14,
15,
16,
1,
2,
3,
4,
5,
6,
7,
8,
9
],
"EcIndex": 7,
"EcM": 11,
"EcN": 5,
"ID": "AAAAAAAAAAAAAAAAAAAAAA==",
"MTime": 1714490866484578361,
"MetaSys": {
"x-minio-internal-erasure-upgraded": "NC0+NQ=="
},
"MetaUsr": {
"content-type": "application/octet-stream",
"etag": "e3d297afb6078748ce597e88bcab5316"
},
"PartASizes": [
6442450982
],
"PartETags": null,
"PartNums": [
1
],
"PartSizes": [
6442450982
],
"Size": 6442450982
},
"v": 1711791716
}
}
]
}
>>> s=client.stat_object("temp", "test-upload3")
>>> s.size
6442450982
- copy file : `client.copy_object("temp", "test-upload4", CopySource("temp", "test-upload3"))`
- stat on new file (note the 2 additional bytes):
client.stat_object("temp", "test-upload4").size 6442450984
- xl.meta on test-upload4 (copied file)
{ "Versions": [ { "Header": { "Flags": 2, "ModTime": "2024-04-30T15:39:03.075700792Z", "Signature": "5e1c9f8c", "Type": 1, "VersionID": "00000000000000000000000000000000" }, "Idx": 0, "Metadata": { "Type": 1, "V2Obj": { "CSumAlgo": 1, "DDir": "Z1VPwIFAT+6+u/SR+S5vkg==", "EcAlgo": 1, "EcBSize": 1048576, "EcDist": [ 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ], "EcIndex": 10, "EcM": 11, "EcN": 5, "ID": "AAAAAAAAAAAAAAAAAAAAAA==", "MTime": 1714491543075700792, "MetaSys": { "X-Minio-Internal-actual-size": "NjQ0MjQ1MDk4NA==", "x-minio-internal-erasure-upgraded": "NC0+NQ==" }, "MetaUsr": { "content-type": "application/octet-stream", "etag": "b70e57ea6e8dd07a73ebf717c8b09e3b-2" }, "PartASizes": [ 5368709121, 1073741863 ], "PartETags": null, "PartNums": [ 1, 2 ], "PartSizes": [ 5368709121, 1073741863 ], "Size": 6442450984 }, "v": 1711791716 } } ] }
logs from `trace_on`:
problem probably comes from byte-range in part2+ which starts again at 0 instead of being at 5 Gib+
**PUT /temp/test-upload4?partNumber=2..**
**X-Amz-Copy-Source-Range: bytes=0-1073741862**
PUT /temp/test-upload4?partNumber=2&uploadId=MjViYjlkZjktMDFjMi00MjBhLWIxZDUtOTdkZDFlZTNmOWExLmVlZmY4ZmQ2LThiZWMtNGI1ZS05MjVlLTk2NWE0Yzk5ZmEzMg HTTP/1.1 X-Amz-Copy-Source: /temp/test-upload3 X-Amz-Copy-Source-If-Match: e3d297afb6078748ce597e88bcab5316 X-Amz-Copy-Source-Range: bytes=0-1073741862
---------START-HTTP--------- HEAD /temp/test-upload3 HTTP/1.1 Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240430T153839Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 6442450982 Content-Type: application/octet-stream ETag: "e3d297afb6078748ce597e88bcab5316" Last-Modified: Tue, 30 Apr 2024 15:27:46 GMT Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F3FFD63302 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:38:39 GMT
----------END-HTTP---------- ---------START-HTTP--------- HEAD /temp/test-upload3 HTTP/1.1 Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240430T153839Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 6442450982 Content-Type: application/octet-stream ETag: "e3d297afb6078748ce597e88bcab5316" Last-Modified: Tue, 30 Apr 2024 15:27:46 GMT Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F4003531B0 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:38:39 GMT
----------END-HTTP---------- ---------START-HTTP--------- POST /temp/test-upload4?uploads= HTTP/1.1 Content-Type: application/octet-stream Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240430T153839Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 332 Content-Type: application/xml Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F400D1180F X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:38:39 GMT
<?xml version="1.0" encoding="UTF-8"?>
----------END-HTTP---------- ---------START-HTTP--------- PUT /temp/test-upload4?partNumber=1&uploadId=MjViYjlkZjktMDFjMi00MjBhLWIxZDUtOTdkZDFlZTNmOWExLmVlZmY4ZmQ2LThiZWMtNGI1ZS05MjVlLTk2NWE0Yzk5ZmEzMg HTTP/1.1 X-Amz-Copy-Source: /temp/test-upload3 X-Amz-Copy-Source-If-Match: e3d297afb6078748ce597e88bcab5316 X-Amz-Copy-Source-Range: bytes=0-5368709120 Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240430T153839Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-match;x-amz-copy-source-range;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 228 Content-Type: application/xml Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F401288A60 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:38:59 GMT
<?xml version="1.0" encoding="UTF-8"?>
----------END-HTTP---------- ---------START-HTTP--------- PUT /temp/test-upload4?partNumber=2&uploadId=MjViYjlkZjktMDFjMi00MjBhLWIxZDUtOTdkZDFlZTNmOWExLmVlZmY4ZmQ2LThiZWMtNGI1ZS05MjVlLTk2NWE0Yzk5ZmEzMg HTTP/1.1 X-Amz-Copy-Source: /temp/test-upload3 X-Amz-Copy-Source-If-Match: e3d297afb6078748ce597e88bcab5316 X-Amz-Copy-Source-Range: bytes=0-1073741862 Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240430T153859Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-match;x-amz-copy-source-range;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 228 Content-Type: application/xml Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F89F391B01 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:39:03 GMT
<?xml version="1.0" encoding="UTF-8"?>
----------END-HTTP---------- ---------START-HTTP--------- POST /temp/test-upload4?uploadId=MjViYjlkZjktMDFjMi00MjBhLWIxZDUtOTdkZDFlZTNmOWExLmVlZmY4ZmQ2LThiZWMtNGI1ZS05MjVlLTk2NWE0Yzk5ZmEzMg HTTP/1.1 Content-Type: application/xml Content-Md5: 88HRuTmwrl0BTBvrjeUSag== Host: xxxxxxx:9000 User-Agent: MinIO (Linux; x86_64) minio-py/7.2.5 Content-Length: 271 X-Amz-Content-Sha256: ba0b1583e910db231e8a87b1d2e658900ccce1ecca1faeccfbabaf83e82d9667 X-Amz-Date: 20240430T153903Z Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240430/us-east-1/s3/aws4_request, SignedHeaders=content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=REDACTED
HTTP/1.1 200 Accept-Ranges: bytes Content-Length: 351 Content-Type: application/xml ETag: "b70e57ea6e8dd07a73ebf717c8b09e3b-2" Server: MinIO Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Origin Vary: Accept-Encoding X-Amz-Id-2: 6030d01c843caef4c6622bc183cd7868c4457173df9b0e60c3980e7c86b7b0b4 X-Amz-Request-Id: 17CB18F9834B0910 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Date: Tue, 30 Apr 2024 15:39:03 GMT
<?xml version="1.0" encoding="UTF-8"?>
----------END-HTTP----------
@mat-gas Please check PR https://github.com/minio/minio-py/pull/1416 fixes the issue.
@balamurugana confirmed the fix solves the issue
could you release a new version of the package on pypi with this fix please? we've had serious data corruption due to this and would need the fix to continue working with our python scripts
anyway, thanks for the quick fix!
v7.2.7 is released.
We have ISO files stored on our minio server that were uploaded years ago
Those are heavily chunked (8 MiB parts)
When copying one of those ISO to another bucket, the new file is corrupted (size differs and 2nd chunk is a copy of the first one, albeit the size)
example with
Win10_22H2_English_x64.iso (size 6115186688 , md5 68c70d7ade5e9ab8510876c1f4bee58a)
copy with python minio API
download again, file is corrupted:
original file at offset 0x8000, we can see
xCD001
...and in corrupted file, second chunk (that starts at offset 0x14000001 , we can see at offset 0x8000 after that we have the same data)
When using
mc cp
, the problem does not appear (file is still OK).mc cp s3/vms/iso/Win10_22H2_English_x64.iso s3/temp/copy-win10-2
When looking at the xl.meta file, we can see that it's chunked differently (lots of 500 MiB chunks instead of 2 chunks (1 very big and 1 small))
xl.meta of original file (Win10_22H2_English_x64.iso)
xl.meta of copy file through python package
xl.meta from "mc cp" (size os good, file is OK)
Your Environment
minio RELEASE.2024-03-30T09-41-56Z
python minio package 7.2.5