softlayer / softlayer-object-storage-backup

Other
11 stars 3 forks source link

Large file support #6

Open fearlsgroove opened 12 years ago

fearlsgroove commented 12 years ago

I realize you noted it's an open issue in the readme -- but I'd love to see large file support in this script. I'd very much like to switch to using object storage (vs NAS) as a backup target, but I'm stuck without large file support. If I were a python hacker I'd leave ya a patch :)

CrackerJackMack commented 11 years ago

Haven't forgotten this. Other issues and cleanups are necessary to accommodate this request and once those are moved out of the way I can revisit this.

benmccann commented 11 years ago

Is this the way that you'd plan to support large objects? http://docs.openstack.org/trunk/openstack-object-storage/developer/content/large-object-creation.html

Do we need to add that support to the client first? https://github.com/softlayer/softlayer-object-storage-python

benmccann commented 11 years ago

Maybe we could use libcloud which already has large file support? It would be nice to not re-invent the wheel and to help support that library since it probably has more users.

https://github.com/apache/libcloud/blob/trunk/libcloud/storage/drivers/cloudfiles.py#L463

benmccann commented 11 years ago

Or actually swiftclient might be better

https://github.com/openstack/python-swiftclient

CrackerJackMack commented 11 years ago

I won't consider using python-swiftclient until https://review.openstack.org/28862 is merged in. Having eventlet forcefully loaded when the "main" process isn't using it causes some really really "fun" stack issues inside of python.

But yes, since the backup client doesn't use any of the SoftLayer specific extensions (specifically search) it would be nice to use the official client instead of softlayer-object-storage-python in this case.

In regards to large files the problem isn't uploading them, but checking the file size/modify date in relation to the manifest file. This really breaks when using checksumming as well since the MD5 of the manifest file is an md5 of the md5's of the objects matching the prefix. Obviously this md5 won't match the local md5 of a file.

Thinking aloud, the HEAD of the manifest should tell me the location of the segments, I can check the dates of all of them to find the "newest" for date comparison. The size on the manifest should equal the local file. For checksum verification I will have to be much more careful here to make sure I read exactly the same number of bytes from the file as the size of the segment I'm going to compare the checksum against because currently I'm reading using a fixed chunk size.

Also to note, I'll need a different file list iterate over because I need to exclude segments from the file list and manifests from the directory list.

This is going to take a bit of re-organization...

benmccann commented 11 years ago

Ah, I see where the extra trickiness comes in now.

As a heads up, the fix for that bug has been merged in (https://github.com/openstack/python-swiftclient/commit/3196daf9929eef25d69d47592beef4cd31573b80) and is in the latest release.

Also, I had been rather confused as to whether libcloud supported OpenStack Swift or not. It turns out it does, but the driver is rather badly named (https://github.com/apache/libcloud/blob/trunk/libcloud/storage/drivers/cloudfiles.py). I see that they do the checksumming for large uploads, but of course that's only to verify that the upload was completed successfully and not to check whether an upload is necessary in the first place.

bkw commented 8 years ago

Is this project still maintained or is there a recommended replacement - preferably with large file support?