skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.09k stars 62 forks source link

[bug] content-type is not keep on skyplane sync #915

Closed jalamprea closed 1 year ago

jalamprea commented 1 year ago

Describe the bug Migrating data from GCP to AWS using skyplane sync, all data was migrated but the content-type of all files is now application/octet-stream

To Reproduce Just run the sync command:

  1. skyplane sync -f gs://my-gcp-bucket s3://my-aws-bucket

Expected behavior Keep the proper content-type of the file instead of replacing it

Transfer client log In the log output from Skyplane, please upload the debug log from the CLI. You can find the path to the file in the log output:

11:21:06 [DEBUG] Using pipeline: <skyplane.api.pipeline.Pipeline object at 0x167b68670>
11:21:08 [DEBUG] [SkyplaneClient] Queued sync job SyncJob()
11:21:10 [WARN]  Falling back to instance class `n2-standard-16` at gcp:us-central1-a due to cloud vCPU limit of 24. You can visit https://skyplane.org/en/latest/increase_vcpus.html to learn more about how to increase your cloud vCPU limits for any cloud provider.
11:21:10 [WARN]  Falling back to instance class `m5.xlarge` at aws:us-east-1 due to cloud vCPU limit of 5. You can visit https://skyplane.org/en/latest/increase_vcpus.html to learn more about how to increase your cloud vCPU limits for any cloud provider.
11:21:10 [DEBUG] Querying objects in vto.buckets.io
11:21:14 [DEBUG] Querying objects in gs://vto-webar

Environment info (please complete the following information):

jalamprea commented 1 year ago

Extra info to replicate the error, the files that were uploaded into GCP were uploaded using gzip. I mean, when we uploaded the files for the first time in GCP, they were uploaded using the command gsutil cp -Z so look like those files are being transferred as binaries, even if they are simple files like HTML JSON or images. Maybe a solution could be detect and decompress the gzip files from the source bucket Or detect if the source file is compressed, then add the header ContentEncoding: gzip to the destination file

sarahwooders commented 1 year ago

Hi @jalamprea thanks for reporting this. So is the content type in GCP also application/octet-stream? Also, is your goal to just transfer data from a VM to S3?

jalamprea commented 1 year ago

The content-type in GCP is the correct type on each file, I mean, text/html, application/javascript, image/jpg, etc... About my goal, yes, I just need to migrate data from my buckets in GCP to a new buckets in AWS S3