google / skicka

Command-line utility for working with Google Drive. Join the mailing list at https://groups.google.com/forum/#!forum/skicka-users.
Apache License 2.0
1.3k stars 118 forks source link

Would/does the drive API support delta syncing for large files? #60

Closed bdklahn closed 9 years ago

bdklahn commented 9 years ago

May as well apply the "question" label to this. It doesn't pay to ask for a delta feature, if Google's API and/or backing store doesn't make this easy, or possible. Transfer requests can be resume-able and, I believe, transfers can be chunked. Is there be a way to calculate hashes on the server-side chunks, to compare to local chunks, to do something like rsync's delta algorithm? This would have some server-side processing overhead, but might reduce Google's transfer load, when dealing with large files.

BTW, thanks for this nice utility. It's my first Go -compiled program. -interesting. I compiled on my PC-BSD machine at work. I was pleasantly surprised when it just worked, when I just scp'ed the ELF executable to my home PC-BSD machine. -even when the "file" utility reported that it was dynamically linked (I hadn't installed the golang port, at home).

Also, I like the thoughtful encryption implementation.

mmp commented 9 years ago

This is a great suggestion, but unfortunately it doesn't seem possible with the Google Drive API--in particular, it doesn't seem to be possible to update part of the contents of an existing file via the chunked/resumable upload API--the upload has to start from the beginning of the file.

The only other option I could think of would be to effectively implement a new filesystem with those sort of update semantics possible on top of Google Drive--effectively splitting large files into chunks and storing those as separate files in Drive, so that delta updates could be done. But this a fair bit of complexity and would lose the niceness of being able to look at a file hierarchy on Drive and see what was going on with it.

One other option might be to use a tool like https://github.com/bup/bup locally and then upload its output to Drive with skicka.

Glad you've found the program useful!

bdklahn commented 9 years ago

zbackup (zbackup.org, https://github.com/zbackup/zbackup), which cites bup as its primary inspiration, looks like it would also accomplish what you're getting at. -only certain, small, archive files would be updated, when changes are sent to a local (deduped) archive. All these, I think, are a similar idea to Apple's sparse bundle bands, which are/were used in its Airport backup system. The zbackup site shows an example using gsutil to send to Google Cloud Storage (GCS), but this ought to work with skicka and Googe Drive, too. Last week I checked out a Google video, which addressed the differences between GCS, and Drive. -something which I was wondering: Why have two different things, which appear to have pretty much the same function? So it looks like Drive (and its API) is mainly for built for (consumer) collaboration (apps), and sharing things you want to work on or see now. GCS (beside being something only computer-type-people might look into), would work best for archiving and/or analyzing large amounts of file-type data. But they apparently do utilize the same backing store, etc. -which, apparently, is not quite as sophisticated as, say, DropBox, in terms of being able to sync file deltas. :-) It is nice to have the free 15 GB Google Drive provides (+100 GB, for two years, with my htc phone). But when you need more, you pay for more storage capacity, regardless of whether you've filled it or not. In that sense GCS, using the "nearline" option (right now the same cost as Drive storage upgrades), might be better/more cost effective for archive/backup purposes, simply because you only pay for what you use. Anyway, skicka (and golang) are interesting. It's useful to me, partly because I'm using FreeBSD (PC-BSD) at work and home. There is no native Google Drive client for *BSD. -not even if I use FreeBSD's Linux compatibility mode.