nextcloud / server

โ˜๏ธ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.15k stars 3.94k forks source link

Delta-sync, or sync only changed bytes in a file #417

Open exokkk opened 7 years ago

exokkk commented 7 years ago

Hi all,

Delta Sync would be great for my truecrypt/veracrypt huge files (~30-100gb). Without delta sync I must stay absent from this product.

Delta Sync would also provide you a feature to distinguish from owncloud

Couldn't there be an optional (maybe extension / folder / file -based) mechanism to perform Delta Sync ("optionally" as I agree that Delta Sync does not make sense for all kind of files/folders)? Maybe even using something existing like rsync?

rullzer commented 7 years ago

I have looked into zsync (client rsync). If we would ever implement such a feature it will most likely will be using that. Since you have to offload all the computation to the client side else you will kill the server.

I don't know about truecrypt/veracrypt. But actually most container formats (and encrypted is even worse). don't lend themself particually well for delta sync. Since often a small change results in a a lot of changed bytes.

exokkk commented 7 years ago

I can only speak for truecrypt (but suppose that veracrypt does the same): if you change 1MB within the container the whole file changes only a bit more than 1MB as well (I am very sure about this). Also not even password changes would change the whole file, only little parts, refer https://news.ycombinator.com/item?id=6523286 and http://crypto.stackexchange.com/questions/18479/how-does-truecrypt-change-password-without-the-need-for-a-complete-re-encryption So Delta Sync + Truecrypt (veracrypt) is a really perfect combination. Although I can see that this feature will not be desired from many people there are some cases, like mine, where it would be great. Maybe there are other cases that I/we cannot think of but exist. Not sure, but VM images might profit from delta sync as well for example. Also, for some files you might uncompress -> compare -> delta_sync -> compress_server_side_again [ok this might be a too costly action, I do not know. This would work for e.g. *.pptx etc as well] Clientside computation seems rational to me.

Thanks for considering the feature in any future release.

tflidd commented 7 years ago

That is an interesting feature for virtual images. Are there any experiences with encrypted containers and diff sync tools?

One annoying thing about the sync-process is that you must transfer the files through the client. You can't just place them from a hard disk or use a faster transfer mechanism. Therefore many ask for a diff-sync feature but the ability to compare files (based on a hash-sum) would already help a lot of these people and it's much easier to implement.

maverick74 commented 7 years ago

I don't mind at all to offload all the computation to the client side, as long as such feature is made available!!! We, actually, consider this a very important feature for business!

To have an idea in a 0-10 scale what would be the cost (not monetary) of developing this?

Tks

wudimenghuan commented 7 years ago

Dropbox and Onedrive have delta sync. Seafile have delta sync, but it cause files broken. I hope you see the rsync. I do need delta sync

ariselseng commented 7 years ago

@rullzer How would the client have the previous file for calculating the delta?

Bigpet commented 7 years ago

@cowai you either need to keep a copy of your last sync around (using file-system specific things like shadow copy seems out of the question for the broad range of platforms with sync clients) or you have to do block-level syncing instead, like "syncthing" does it.

stratacast commented 7 years ago

I think block-level syncing like syncthing is probably the easiest implementation in code, and perhaps the cheapest to write. I'm seriously interested in this, and I know some companies that are too (Quickbooks files man...ugly stuff). Like @Bigpet said, you'd need a copy of the file before changes onhand, or put some hooks into writes that go into that specific directory, but the latter sounds very messy and dangerous. I wish I knew how to write code better because I would 100% do this..I'm definitely a Kindergarten koder compared to a lot of people that put stuff on github. Thought I'd voice that there's interest on my end, and on the end of local companies I know.

eglipeter commented 7 years ago

Are there concrete plans when delta syncing will be available? May I hope to see this implemented in Nextcloud 13 already?

gschenck commented 6 years ago

There is some progress on owncloud:

owncloud/core#16162

ahmedammar commented 6 years ago

@gschenck Please feel free to try out the latest code, the core implementation should be complete now.

jkaberg commented 6 years ago

@ahmedammar any plans on submitting the PR against NC as well further down the road?

ahmedammar commented 6 years ago

@jkaberg once the work is complete and merged in oC I can have a look, assuming the code-base isn't too different at the core ...

maverick74 commented 6 years ago

@ahmedammar can you give us an update about the feature? (If possible a probable ETA?)

ahmedammar commented 6 years ago

@maverick74 no ETA for nextcloud, if someone is willing to open a bounty for it I could look into it more urgently, otherwise, for reference: owncloud/client#6131 owncloud/core#29404

L00maca commented 6 years ago

It's not much and I'm not even sure I did this right since I never did this before, but I don't mind chipping in to help this get done. Bountysource

jospoortvliet commented 6 years ago

The bounty is already at 115 dollar now. It should not be terribly hard to get this merged in Nc client and server, I think, but it won't make it for 13 ๐Ÿ˜„

ahmedammar commented 6 years ago

I wonโ€™t be looking into this until oC actually merge first, since that saves me any duplicated effort. Unless this bounty gets so big that I can ignore oC all together :)

maverick74 commented 6 years ago

FWIW i guess there are some news at https://github.com/owncloud/core/pull/29404

wudimenghuan commented 6 years ago

@maverick74 So It can be merged... @rullzer @jospoortvliet

tflidd commented 6 years ago

FWIW i guess there are some news at owncloud/core#29404

That's the server side. Client-side is still on a development branch and subject to testing (https://github.com/owncloud/client/labels/Delta-sync). Unless this is not finished, it doesn't make a lot of sense to merge anything at the moment, so you can only help testing it.

petrk94 commented 6 years ago

I think nextcloud should hurry up, delta sync will be released in the next owncloud update: https://owncloud.com/owncloud-implements-delta-sync-technology/

jospoortvliet commented 6 years ago

@petrk94 yeah, it could in theory be merged - but ownCloud notes it'll be in testing until 2019, let's see. @ahmedammar can make a PR for the server - the client will get it as we sync upstream actively still.

petrk94 commented 6 years ago

Im wondering why I get so much thump down, just want to keep the thread updated :/

jcklpe commented 5 years ago

If I'm understanding stuff correctly it sounds like NextCloud won't be having this feature any time soon, correct?

tflidd commented 5 years ago

ownCloud currently uses client version 2.4. Version 2.5 is in beta tests now (https://github.com/owncloud/client/issues/6483) and the delta sync feature was announced for version 2.6. Now with a bit of guessing, between major releases there are often 6 months or more, so I wouldn't expect a working client before the end of this year.

From Nextcloud side, they took over their own development to realize the new client-side encryption which is currently in beta status. This feature is one of the main priorities at the moment, and it will probably take some time to ship this feature and get it really stable. After that, they could implement delta sync but I won't expect it before mid-2019. This is no official statement, priorities can change ...

maverick74 commented 5 years ago

Apparently the client-side is already merged ( https://github.com/owncloud/client/pull/6297 )

But they're still hunting for bugs until 2.6.0.

It would be nice to have it as an experimental Opt-in feature over here, however :)

jospoortvliet commented 5 years ago

We might merge it during the course of our 2.6 development, I suppose - but we have a huge amount of things we want to work on, not sure how high the prio is on this one. Help is welcome - if somebody feels like creating a PR for our client that backports this feature, that'd be cool of course!

gschenck commented 5 years ago

ownCloud currently has delta-sync for testing in the server and the daily build of the client.

server: https://github.com/owncloud/core/pull/29404 client: https://github.com/owncloud/client/pull/6771

announcement: https://github.com/owncloud/core/pull/29404#issuecomment-474783452

jospoortvliet commented 5 years ago

Nice. We might work with its author @ahmedammar to get it into Nextcloud in the future as well. As I said, it isn't high on our priority list, as we still have a lot of stabilization to do for the Drive and E2EE features and have a lot of plans around UI and server integration work. But I believe you can donate to the feature to help motivate @ahmedammar :smile:

realies commented 5 years ago

@jospoortvliet, is donating to the bounty/@ahmedammar the only way to get this higher in your priority list?

TechupBusiness commented 5 years ago

Delta sync is a killer criteria for lots of users... I dont understand that this is not higher in priority because I know companies that moved to Dropbox because it's the only file cloud service offering it. Seems that OwnCloud will win the second place after DropBox ;-) I hope for NextCloud it will catch up too...

fracture-point commented 5 years ago

Chipped in on the bounty because this is high-priority for me, and I would much rather stay with NC than convert (back) to OC. I'd be happy to help test as well.

ghost commented 5 years ago

+1 for me. Delta sync is hugely important. I can only hypothesize that the reason it's low on your priority list is that you are chasing cool new features vs what everyone can benefit from and maybe the voice of this need just isn't being heard (he who shouts loudest?) I need to sync VM images and huge PST files daily.

lowlyocean commented 4 years ago

Is it possible to use Nextcloud server 15 but owncloud 2.6.0+ (featuring DeltaSync)? I migrated from ownCloud to nextCloud and would rather not risk migrating back,. This feature seems important. What drives the prioritization of Drive and E2EE ahead of DeltaSync?

iskradelta commented 4 years ago

Ill have a stab at this the next few days, looking at ownCloud - they use zsync, with a lot of code to integrate it with owncloud apis, this .zsync metadata file, meh.

Id rather go full rsync, it should be possible for the server to shell out to rsync daemon or client, and connect its stdin and stdout through a http tunnel to the nextcloud-clients.

nearwood commented 4 years ago

FWIW, Windows has (had?) Remote Differential Compression built-in and there was some technical documentation on it that might have been useful, but I cannot find it anymore.

ahmedammar commented 4 years ago

@iskradelta reinventing the wheel sounds like a great plan!

realies commented 4 years ago

@ahmedammar, you're not making it easier... ๐Ÿ˜

iskradelta commented 4 years ago

@ahmedammar reinvinting the wheel? Thats the opposite of my plan, instead of "reinventing the wheel", meaning "reimplement rsync algorithm or another differential algorithm", and then "reimplement or make yourslef a new protcol" or "now fit the existing wire protocol on top of your api"... the plan is to do the opposite, tunnel the existing rsync network wire protocol over an existing connection which nextcloud-servers to client has - the websocket connection, instead of HTTP tunnels as I wrote above, since most people cant configure that correctly.

A prototype is already working for me, on the nextcloud-server part, it took one evening of "coding".

Zsync implementation is self-mutilation "oh rsync cant be done over http, lets modify the network protocol to do rsync over http", but yeah you can tunnel anything over http or websockets, and the owncloud implementation of it, is bugy and too large to maintain.

ariselseng commented 4 years ago

@iskradelta will your implementation scale? If you have many users doing rsync you will make it do way more work than with zsync if I am not mistaken?

iskradelta commented 4 years ago

@ariselseng rsync is only cpu intensive on the sender side. The sender side can be the client or the server, depending on if the user is uploading or downloading. There is a limit to how many users can be syncing their tree (initial downloading) at the same time, that limit is the cpu available to the server, if not hitting bandwidth limit before that, and only gets hit - when the users tree (files) have changed timestamp or size - so once synced - many users can keep "syncing" without causing high cpu.

When, if ever, this becomes a problem there is a solution, to condier caching to avoid the expensive checksumming. But I dont like it, since it means we just assume that syncing means "is always initial sync" - that users dont have any of their data on their phones/clients. And its really a benefit (zsync pre-calculated metadatafile) when all the users are downloading the same tree (files), again in the case of zsync makes sense when its made for public data like iso images.

There is a reason even dropbox is using librsync. Its the best tool, the best.

ahmedammar commented 4 years ago

Good luck.

jospoortvliet commented 4 years ago

@iskradelta I look forward to try out your experiment ;-)

wrt others asking about priorities - we prioritize things that benefit more users or that are paid for by customers. While everyone here cares deeply about deltasync, 99% of the users don't handle very big files in which small parts are regularly changed - the only scenario's I can think of are VM's and encrypted filesystems, both of which are never used by the vast majority of computer users. The drive and E2E have big benefits for normal users, meanwhile, so we focus there. And finishing those is taking more than long enough, I hope you don't mind that we don't take on another huge task until we have those both done. Our team can actually barely handle the support load for customers, that's the main reason we are not making much progress. We're trying to hire more people for 3 years already :(

kesselb commented 4 years ago

Just to let everyone know: I deleted a post violating the code of conduct. If you want proof drop me a line (by email).

kesselb commented 4 years ago

@Ornias1993 you're still invited to add your technical comments regarding this feature request.

We do not tolerate personal attacks, racism, sexism or any other form of discrimination. Disagreement is inevitable, from time to time, but respect for the views of others will go a long way to winning respect for your own view.

Just keep that in mind.

Ornias1993 commented 4 years ago

@kesselb It seems being a douche is okey, as long as being a douche is project related and not personal. Thats not a moderation policy I can accept and thus will not assist any further.

realies commented 4 years ago

@kesselb, dropping you a line.

kesselb commented 4 years ago

@realies Feel free to write me an email (I had that in mind with drop me a line and added by email now to clarify that). Actually the comment in question was similar to https://github.com/nextcloud/server/issues/417#issuecomment-544148632 but used a language violation the code of conduct.

Lordroran commented 4 years ago

the only scenario's I can think of are VM's and encrypted filesystems, both of which are never used by the vast majority of computer users.

The scenario i deal with is big Outlook files. I could think that is used by more users.