kpfaulkner / azurecopy

copy blobs between azure, s3 and local storage
Apache License 2.0
36 stars 13 forks source link

Any way to speed up the copy process from AWS to Azure? #35

Open rfreas opened 6 years ago

rfreas commented 6 years ago

I am copying all files from an AWS S3 bucket to an Azure storage container. All files are blobs and there are no directories. I am using the binaries hosted here in azurecopy-1.5.1.7z. I am running a command similar to the following:

azurecopy -i https://mybucket.s3-us-east-2.amazonaws.com -o https://myaccount.blob.core.windows.net/mycontainer -blobcopy -destblobtype block

I have set AzureAccountKey, AWSAccessKeyID, AWSSecretAccessKeyID, and AWSRegion values as appropriate in the app.config with everything else left blank.

On the DEV s3 bucket, I have over 60GB and 42K objects to copy. In PROD s3, the numbers are higher -- something like 150GB and nearly 100K objects.

When running this command line from my local machine, it seems like it will take more than 15 hours to complete, but that's just a guess. I am going to let it run all night and see.

Any thoughts on making this run faster?

Thanks!

kpfaulkner commented 6 years ago

Hi

One thing you could try is to play around with the BlobCopyBatchSize setting. Currently this needs to be set in the config file (as opposed to CLI argument). The default setting is 100, which means Azure will attempt to copy 100 blobs at a time. I'm not sure what the maximum can be before Azure complains, but it's the only thing I can suggest right now.

So the line in the config file should be something like:

Please let me know how you go with that.

Thanks

Ken

On Tue, Apr 3, 2018 at 7:52 AM, Rob notifications@github.com wrote:

I am copying all files from an AWS S3 bucket to an Azure storage container. All files are blobs and there are no directories. I am running a command similar to the following:

azurecopy -i https://mybucket.s3-us-east-2.amazonaws.com -o https://myaccount.blob.core.windows.net/mycontainer -blobcopy -destblobtype block

I have set AzureAccountKey, AWSAccessKeyID, AWSSecretAccessKeyID, and AWSRegion values as appropriate in the app.config with everything else left blank.

On the DEV s3 bucket, I have over 60GB and 42K objects to copy. In PROD s3, the numbers are higher -- something like 150GB and nearly 100K objects.

When running this command line from my local machine, it seems like it will take more than 15 hours to complete, but that's just a guess. I am going to let it run all night and see.

Any thoughts on making this run faster?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/35, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHGUoP7Ofb7HIz-cuAnE7JppkzLjxVZks5tkp25gaJpZM4TEJHO .

rfreas commented 6 years ago

Thanks for the reply @kpfaulkner -- ahh, I didn't see that option in the command line help when you run azurecopy.exe without any parameters -- good to know, thanks.

So I gave it a quick go just now to see if I could tell any difference -- and it still seems slow, even though I have the -skip parameter set since I don't want to copy anything previously copied. I write that it's still seems slow because it's printing out each URL as it goes and it takes nearly a second per URL. If it's doing batches of 500-1000, I would expect to see output that reflects that. Am I misunderstanding or perhaps it's not using -blobcopy correctly and running at the server?

Thanks,

Rob

rfreas commented 6 years ago

I tried it out with a batch size setting of 1000 and it took about 7 hours to copy 65GB/42K objects from AWS to Azure. I'll try running it again with a batch size of 5000 and see if there's any change or if it's even tolerated.

Let me know if you @kpfaulkner have any additional thoughts or what I should expect in the log output or otherwise with various batch sizes, etc.

Thanks,

Rob

rfreas commented 6 years ago

As a follow up, I tried batch size of 1000, 5000, and 5000 with -skip specified (I deleted one of the docs in the destination just to test skip) and they all performed about the same, with the last run with -skip taking the longest by about 40 minutes or so.

I am not sure why it takes as long as it does or if there's anyway to speed this up, so if anyone has thoughts, let me know. Thanks again all.

Rob

kpfaulkner commented 6 years ago

Hi

I'll be looking at it on the weekend, but chances are it might just be a limitation of how many we can write to a single account. But I'll investigate and confirm.

Cheers

Ken

On Fri, Apr 6, 2018 at 5:26 AM, Rob notifications@github.com wrote:

As a follow up, I tried batch size of 1000, 5000, and 5000 with -skip specified (I deleted one of the docs in the destination just to test skip) and they all performed about the same, with the last run with -skip taking the longest by about 40 minutes or so.

I am not sure why it takes as long as it does or if there's anyway to speed this up, so if anyone has thoughts, let me know. Thanks again all.

Rob

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/35#issuecomment-379049656, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHGUuX8AT9i7ejAhU2nuSTYDC4dV4sSks5tlm_vgaJpZM4TEJHO .