kpfaulkner / azurecopy

copy blobs between azure, s3 and local storage
Apache License 2.0
36 stars 13 forks source link

Give option to skip existing blobs #22

Open kpfaulkner opened 7 years ago

kpfaulkner commented 7 years ago

If blob exists in destination allow option to skip it.

kpfaulkner commented 7 years ago

Fixed for azure destination in release https://github.com/kpfaulkner/azurecopy/releases/tag/1.4.0 Will expand to other cloud destinations soon.

kevinneumann commented 6 years ago

First off, thank you very much for this tool. Our previous solution (cherrysafe.com) is shutting down and we needed something to copy our Azure blobs to S3 to protect us from Azure security breaches or programming errors that might inadvertently delete blobs.

I'm pretty new to GitHub (at least from a contribution side of things). However, I made some changes to the S3Handler to allow for skipping Amazon S3 Objects. Based on my findings, there isn't a built in "exists" check in the S3 SDK so I worked around this by attempting to "get" the object and then capturing the error if it didn't exist. I also moved the GenerateS3Client into a separate function so that I am only creating the client once and reusing it rather than creating a new client for each check. It seems to work for my case. (Code below)

Also, I am implementing a change to pass in a -u parameter (updated since, .eg -u -7 for modified in the last 7 days). For my scenario of copying Azure blobs to S3, I can check the Azure properties for the LastModified date. This seems to speed up processing. Right now I only have this working for our Azure --> S3 scenario.

`static IAmazonS3 _client; public IAmazonS3 Client(string targetAWSAccessKeyId, string TargetAWSSecretAccessKeyId, string containerName) {

        if (_client == null)
        {
            Console.WriteLine("Create client.");
            _client = S3Helper.GenerateS3Client(ConfigHelper.TargetAWSAccessKeyID, ConfigHelper.TargetAWSSecretAccessKeyID, containerName);
        }
        return _client;
    }

    /// <summary>
    /// Does blob exists in container
    /// </summary>
    /// <param name="containerName"></param>
    /// <param name="blobName"></param>
    /// <returns></returns>
    public bool DoesBlobExists(string containerName, string blobName)
    {

        var exists = false;

        try
        {
            Client(ConfigHelper.TargetAWSAccessKeyID, ConfigHelper.TargetAWSSecretAccessKeyID, containerName).GetObject(containerName, blobName);

            exists = true;
        }
        catch
        {

        }

        Console.WriteLine("Blob Exists? " + exists.ToString());
        return exists;
    }

`

kpfaulkner commented 6 years ago

Hi

Thanks, I'll add the DoesBlobExist implementation into the next version. As for the static S3 Client, I'd need to check what would happen for a S3 -> S3 copy. I know some people have used the tool for that scenario and I want to make sure both source and target can have different creds, but I'll look into it.

Glad the tool is useful!

Cheers

Ken

kpfaulkner commented 6 years ago

@kevinneumann Just one thought.... would copying between one Azure region and another meet your requirements? Just that copying to Azure is REALLY quick (in comparison) by using the blobcopy protocol. This means it will copy from source to target without having to go through your local network.

Just a thought.

Cheers

Ken

kevinneumann commented 6 years ago

We did consider that but management wanted things completely separate to protect against account breach on Azure as well malicious actions by another employee. Worst case scenarios. Our S3 backup account has very limited user access (which I suppose we could probably replicate with a completely separate Azure account or different Azure security). Any ways, with the changes I was able to make, our daily process runs in less than 20min for over 500,000 files. We are running the process in an Azure VM so the bandwidth is super speedy.