treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.2k stars 335 forks source link

Failure to Complete Multipart Upload for Large Files via S3 GW with GCS #7822

Open guy-har opened 1 month ago

guy-har commented 1 month ago

When using lakeFS with GCS as the block adapter, moving ~8GB files via AWS CLI through S3 GW (default multipart upload chunk size of 8MB) fails on complete multipart upload due to unexpected number of parts.

Addition by Nadav: OK I believe that I reproduced the problems(with Elad's help)

Command line: nadavsteindler@Nadavs-MacBook-Pro Downloads % aws --endpoint-url http://localhost:8000 s3 cp ~/Desktop/bigfile.txt s3://gcptest/main/ upload failed: ../Desktop/bigfile.txt to s3://gcptest/main/bigfile.txt An error occurred (InternalError) when calling the CompleteMultipartUpload operation (reached max retries: 2): We encountered an internal error, please try again.

lakefs Log: ERROR [2024-06-30T19:01:47+03:00]pkg/gateway/operations/postobject.go:122 pkg/gateway/operations.(*PostObject).HandleCompleteMultipartUpload could not complete multipart upload error="part list mismatch - expected 76 parts, got 954: multipart part list mismatch" host="localhost:8000" matched_host=false method=POST operation_id=post_object path=bigfile.txt physical_address=data/gf6or3eiuvokunnmsks0/cq0nt2miuvokunnmsli0 ref=main repository=gcptest request_id=737df8cf-2535-4a4b-8e8a-5a784ce9433a service_name=s3_gateway upload_id=data/gf6or3eiuvokunnmsks0/cq0nt2miuvokunnmsli0 user=admin

Ah but this is after the CLI does a retry- the error from the 1st try is ERROR [2024-07-03T16:13:07+03:00]pkg/block/gs/adapter.go:525 pkg/block/gs.(Adapter).CompleteMultiPartUpload CompleteMultipartUpload failed error="context canceled" host="localhost:8000" key=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g matched_host=false method=POST operation_id=post_object path=bigfile.txt physical_address=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g qualified_key=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g qualified_ns=nadav_bucket_7822 ref=main repository=gcptest request_id=486dfb58-1cec-46e1-aca6-0e534bef4bfa service_name=s3_gateway upload_id=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g user=admin ERROR [2024-07-03T16:13:07+03:00]pkg/gateway/operations/postobject.go:122 pkg/gateway/operations.(PostObject).HandleCompleteMultipartUpload could not complete multipart upload error="context canceled" host="localhost:8000" matched_host=false method=POST operation_id=post_object path=bigfile.txt physical_address=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g ref=main repository=gcptest request_id=486dfb58-1cec-46e1-aca6-0e534bef4bfa service_name=s3_gateway upload_id=data/gf4rde6iuvolaihdbt40/cq2kot6iuvolaihdbt4g user=admin

Which seems to indicate a timeout from the aws cli

Filesize Partsize Result 1GB, 1MB, SUCCEED 4GB, 1MB, SUCCEED 6GB, 8MB, SUCCEED 8GB, 5MB part list mismatch - expected 627 parts, got 1526 8GB, 8MB, part list mismatch - expected 76 parts, got 954, expected 30 parts, got 954, expected 117 parts, got 954, etc. 8GB, 16MB SUCCEED 10GB, 8MB part list mismatch - expected 46 parts, got 1193:

BUT if I increase the AWS CLI Timeout it succeeds aws --cli-read-timeout 300 --endpoint-url http://localhost:8000 s3 cp ~/Desktop/bigfile.txt s3://gcptest/main/

Root Cause: When we try to upload large files with many parts to merge, the AWS CLI hits a timeout on the CompleteMultipart call. It then retries the call and this time fails when the number of parts doesn't match, since we have already concatenated some of the parts.

Note on gs.adapter.go:composeMultipartUploadParts

  1. on the one hand we seem to want to support retry on failure mid-merge of the parts i.e. we list the current part files and merge those files specifically(so that if we already merged some it should continue where we left off)
  2. on the other hand the validation checks that the file names we get from the list operation match exactly the uploaded file list from the S3 adapter call, so this validation will fail on retry since we already concatenated some of the part objects

Proposed fix: We can support retry on timeout by loosening up the validate criteria. Instead of insisting on an exact match, we can check

  1. number of parts listed <= number of expected parts
  2. expected part names CONTAINS each listed part name, ETAG

Fix option 2: We don't support retry. On failure we clear all the uploaded files and return the timeout error and you have to start over. If it does a retry, you should immediately get a file not found error. This simplifies the code, you don't have to list objects, you just use the original filenames and if one is not found because we already composed it with others- error.

nadavsteindler commented 3 days ago

Root Cause Analysis Well, the number of parts returned by list is clearly way lower than the number copied, and we can observe in the console 100's of copied parts. Screenshot 2024-07-01 at 13 42 03

Possible Root Cause:

  1. Eventual consistency with list. But the number seems really small- we didn't just miss the last few
  2. List API says it may return truncated results https://cloud.google.com/storage/docs/xml-api/get-object-multipart and I don't see that we account for this in the code.

Possible Solution: Comparing with S3, we don't build the list of parts via list command, rather we transform the list of parts we have been building with each upload- maybe that is the way to go here

nadavsteindler commented 2 days ago

I suspect that using the startoffset field will help me page the list operation https://pkg.go.dev/cloud.google.com/go/storage#section-readme

nadavsteindler commented 2 days ago

listRes.txt this is an example where it said "part list mismatch - expected 87 parts, got 954:

This query tag seems to help err := query.SetAttrSelection([]string{"Name", "Etag"}) Sometimes it passes, sometimes the list gets most but not all of the objects(usually the later ones)

Maybe the way to go is to get rid of the list operation- anyway we know the names...

nadavsteindler commented 1 day ago

Update:

  1. paginating with startoffset doesn't solve the problem at all- see the example of files returned by list- the earlier files are missing, while the later ones are present, so it's not that it is returning the 1st 100 or last 100
  2. telling the api exactly which attributes to return helps somewhat- it succeeds some of the time, and fails with more files returned(100's instead of 10's) but is still not a reliable solution
  3. what does work reliably is to forgo the list operation and just generate the list of part names based on the naming convention- this might be the optimal solution- testing more... note that if we try to merge a nonexistent part file, GCS catches it: googleapi: Error 500: Error running compose
nadavsteindler commented 1 day ago

Test result: 8GB, 5MB ERROR [2024-07-03T12:20:48+03:00]pkg/block/gs/adapter.go:514 pkg/block/gs.(Adapter).CompleteMultiPartUpload CompleteMultipartUpload failed error="googleapi: Error 404: No such object: nadav_bucket_7822/data/gf4v5neiuvoinvg6qugg/cq2hbreiuvoinvg6qui0.part_00001, notFound" host="localhost:8000" key=data/gf4v5neiuvoinvg6qugg/cq2hbreiuvoinvg6qui0 matched_host=false method=POST operation_id=post_object path=bigfile.txt physical_address=data/gf4v5neiuvoinvg6qugg/cq2hbreiuvoinvg6qui0 qualified_key=data/gf4v5neiuvoinvg6qugg/cq2hbreiuvoinvg6qui0 qualified_ns=nadav_bucket_7822 ref=main repository=gcptest request_id=c9042832-c456-48f4-94fe-e006f744d5f2 service_name=s3_gateway upload_id=data/gf4v5neiuvoinvg6qugg/cq2hbreiuvoinvg6qui0 user=admin Screenshot 2024-07-03 at 12 19 55 The failure is very strange since the file is clearly visible in the GCS console 8GB, 6MB FAIL- same error 8GB, 7MB, SUCCEED 8GB, 8MB, SUCCEED 8GB, 16MB SUCCEED 10GB, 8MBERROR [2024-07-03T14:54:36+03:00]pkg/block/gs/adapter.go:514 pkg/block/gs.(Adapter).CompleteMultiPartUpload CompleteMultipartUpload failed error="context canceled" host="localhost:8000" key=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 matched_host=false method=POST operation_id=post_object path=bigfile10GB.txt physical_address=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 qualified_key=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 qualified_ns=nadav_bucket_7822 ref=main repository=gcptest request_id=44b14c25-0336-44dd-8432-4071438df456 service_name=s3_gateway upload_id=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 user=admin ERROR [2024-07-03T14:54:36+03:00]pkg/gateway/operations/postobject.go:122 pkg/gateway/operations.(*PostObject).HandleCompleteMultipartUpload could not complete multipart upload error="context canceled" host="localhost:8000" matched_host=false method=POST operation_id=post_object path=bigfile10GB.txt physical_address=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 ref=main repository=gcptest request_id=44b14c25-0336-44dd-8432-4071438df456 service_name=s3_gateway upload_id=data/gf4siomiuvoknlkh9bcg/cq2jjg6iuvoknlkh9bd0 user=admin

nadavsteindler commented 1 day ago

Screenshot 2024-07-03 at 21 56 39 With the fix to the validation code, both 8GB and 10GB file cases now succeed, without touching the client timeout. The AWS CLI times out on the last call after 60 seconds, retries and completes the upload!

I wonder if ComposeAll can be written concurrently so that it doesn't time out...