kpfaulkner / azurecopy

copy blobs between azure, s3 and local storage
Apache License 2.0
36 stars 13 forks source link

Folders not copied from AWS to Azure #26

Open falabs opened 7 years ago

falabs commented 7 years ago

Awesome tool i must say, saved alot for me and my team. After some couple of tweaks we were able to use it. Got one observation, while copying from aws to azure the content of the folders in aws are all copied into a single container. That means all the objects from an s3 bucket were all dumped into one container without been grouped into folders. Tried copying locally and experienced the same issue. Any suggestions on how to go about this.

kpfaulkner commented 7 years ago

The way azurecopy has been designed, it will reuse the "fake" directories that people often put in their blob names. For example, if in S3 you have the blobs called:

vdir1/vdir2/file1 vdir1/vdir3/file2

Then although it looks like you have directories (and often UI tools will show them as directories) you don't really. They're just long blob names. (sorry if you already know this, but I get this asked a bit).

What Azurecopy does is copy from an S3 bucket to an Azure Container, keeping the fake directory look for the blobs. So yes, all blobs will ALWAYS go into a single collection, but that is completely by design. Again, I'm treating buckets and containers as equivalents here.

Are you saying that you're NOT seeing the fake directories in the blob names after you copy? If so, definite bug there. Can you give me an example of your directory structure and the command you're using?

Thanks

Ken

falabs commented 7 years ago

It was an error from me, i was adding -blobcopy at the end of azurecopy.exe -i aws s3url -o azure storage url -blobcopy.

Whenever i try using it without -blobcopy i get this error below, please note that am copying from s3 to azure and the bucket has alot of folder and subfolders with compressed .tar.gz files. using -blobcopy works out fine (takes 18 hours to complete):

Unknown error generated. Please report to Github page https://github.com/kpfaulkner/azurecopy/issues . Can view underlying stacktrace by adding -db flag. azurecopy.Exceptions.CloudWriteException: AzureHandler::WriteBlob cannot write ---> System.DivideByZeroException: Attempted to divide by zero.

at azurecopy.AzureHandler.ParallelWriteBlockBlob(Stream stream, CloudBlockBlob blob, Int32 parallelFactor, Int32 chunkSizeInMB) at azurecopy.AzureHandler.WriteBlob(String containerName, String blobName, Blob blob, Int32 parallelUploadFactor, Int32 chunkSizeInMB) --- End of inner exception stack trace --- at azurecopy.AzureHandler.WriteBlob(String containerName, String blobName, Blob blob, Int32 parallelUploadFactor, Int32 chunkSizeInMB) at azurecopycommand.Program.DoNormalCopy(Boolean debugMode) at azurecopycommand.Program.Main(String[] args)

kpfaulkner commented 7 years ago

Interesting... have never seen that before. Have you set the ChunkSizeInMB setting in the config file? I can't quite tell but it looks like that is being set to 0 somehow.

Also, using -blobcopy will be the way faster option (at least I've never heard of it taking longer). If the -blobcopy worked, I'd stick with that. Out of interest, how large was this copy? Since 18 hours is the most I've seen. Usually it's minutes....

falabs commented 7 years ago

ChunkSizeInMB was not set in the config file, we used a Windows Vm in Azure to run the script. Using -blobcopy will not have the files within there folders and would have all objects in the container (we have over 50 folders).

This copy is more than 5 GB (<30 GB) We recently had to move all our resources and entire application from aws to Azure. We may have to hare our experience soon on migrating a SAAS from aws to Azure with minimal downtime.

kpfaulkner commented 7 years ago

Hi

By default it will recreate the structure. Are you not seeing that?

Thanks

Ken

On Fri, Jul 7, 2017 at 2:27 AM, kleon-hhog notifications@github.com wrote:

is it possible to have the transfer create the "fake" directories and thus preserve the file structure?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/26#issuecomment-313447858, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHGUmYnBA6hxKKrSsGdU52xk5WdUdRMks5sLQrXgaJpZM4NqXkw .

kleon-hhog commented 7 years ago

Hello!

I was using the –blobcopy option which was simply putting all of the files into the root of the container. Which from what you posted was as intended? Without the blobcopy option it recreates the folders perfectly.

Thanks! Keith Leon Infrastructure Administrator Office: (631) 676-2186 ext. 488 Hedgehog Development, LLC. http://www.hhog.com/ Productshttp://www.hhogdev.com/products.aspx | Serviceshttp://www.hhogdev.com/solutions.aspx | Twitterhttps://twitter.com/hhogdev | Facebookhttps://www.facebook.com/hhogdev | Linkedinhttps://www.linkedin.com/company/hedgehog-development | Jobshttp://www.hhogdev.com/careers/jobopenings.aspx

From: Ken Faulkner [mailto:notifications@github.com] Sent: Thursday, July 6, 2017 3:10 PM To: kpfaulkner/azurecopy azurecopy@noreply.github.com Cc: Keith Leon kleon@hhogdev.com; Comment comment@noreply.github.com Subject: Re: [kpfaulkner/azurecopy] Folders not copied from AWS to Azure (#26)

Hi

By default it will recreate the structure. Are you not seeing that?

Thanks

Ken

On Fri, Jul 7, 2017 at 2:27 AM, kleon-hhog notifications@github.com wrote:

is it possible to have the transfer create the "fake" directories and thus preserve the file structure?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/26#issuecomment-313447858, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHGUmYnBA6hxKKrSsGdU52xk5WdUdRMks5sLQrXgaJpZM4NqXkw .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/kpfaulkner/azurecopy/issues/26#issuecomment-313491057, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AcYKpuOkUBRWIz3ZvomMu27chKEXKR-9ks5sLTEIgaJpZM4NqXkw.

kpfaulkner commented 7 years ago

Hi

Ahh ok, that clears it up. Let me check about the blobcopy scenario and the directory structure. The plan it that it should keep the structure but let me investigate.

Thanks

Ken

On Fri, Jul 7, 2017 at 5:20 AM, kleon-hhog notifications@github.com wrote:

Hello!

I was using the –blobcopy option which was simply putting all of the files into the root of the container. Which from what you posted was as intended? Without the blobcopy option it recreates the folders perfectly.

Thanks! Keith Leon Infrastructure Administrator Office: (631) 676-2186 ext. 488 <(631)%20676-2186> Hedgehog Development, LLC. http://www.hhog.com/ Productshttp://www.hhogdev.com/products.aspx | Services< http://www.hhogdev.com/solutions.aspx> | Twitterhttps://twitter.com/ hhogdev | Facebookhttps://www.facebook.com/hhogdev | Linkedin< https://www.linkedin.com/company/hedgehog-development> | Jobs< http://www.hhogdev.com/careers/jobopenings.aspx>

From: Ken Faulkner [mailto:notifications@github.com] Sent: Thursday, July 6, 2017 3:10 PM To: kpfaulkner/azurecopy azurecopy@noreply.github.com Cc: Keith Leon kleon@hhogdev.com; Comment comment@noreply.github.com Subject: Re: [kpfaulkner/azurecopy] Folders not copied from AWS to Azure (#26)

Hi

By default it will recreate the structure. Are you not seeing that?

Thanks

Ken

On Fri, Jul 7, 2017 at 2:27 AM, kleon-hhog notifications@github.com wrote:

is it possible to have the transfer create the "fake" directories and thus preserve the file structure?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/ 26#issuecomment-313447858, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAHGUmYnBA6hxKKrSsGdU52xk5WdUdRMks5sLQrXgaJpZM4NqXkw .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ kpfaulkner/azurecopy/issues/26#issuecomment-313491057, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ AcYKpuOkUBRWIz3ZvomMu27chKEXKR-9ks5sLTEIgaJpZM4NqXkw.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kpfaulkner/azurecopy/issues/26#issuecomment-313493547, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHGUsmTovF51d1oTGveOZ69wbQIcx6Nks5sLTOCgaJpZM4NqXkw .

kpfaulkner commented 7 years ago

Please try version 1.5.0 and let me know how it goes.

https://github.com/kpfaulkner/azurecopy/releases/tag/1.5.0

Thanks

kleon-hhog commented 7 years ago

Thanks!

Keith Leon Infrastructure Administrator Office: (631) 676-2186 ext. 488 Hedgehog Development, LLC. http://www.hhog.com/ Productshttp://www.hhogdev.com/products.aspx | Serviceshttp://www.hhogdev.com/solutions.aspx | Twitterhttps://twitter.com/hhogdev | Facebookhttps://www.facebook.com/hhogdev | Linkedinhttps://www.linkedin.com/company/hedgehog-development | Jobshttp://www.hhogdev.com/careers/jobopenings.aspx

From: Ken Faulkner [mailto:notifications@github.com] Sent: Monday, July 10, 2017 5:06 AM To: kpfaulkner/azurecopy azurecopy@noreply.github.com Cc: Keith Leon kleon@hhogdev.com; Comment comment@noreply.github.com Subject: Re: [kpfaulkner/azurecopy] Folders not copied from AWS to Azure (#26)

Please try version 1.5.0 and let me know how it goes.

https://github.com/kpfaulkner/azurecopy/releases/tag/1.5.0

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/kpfaulkner/azurecopy/issues/26#issuecomment-314048933, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AcYKpl5kEec8BaYsiCPl7jkTMTg-BqSbks5sMel2gaJpZM4NqXkw.