Do Discovery and Propagation in parallel.

ogoffart commented 4 years ago

Right now, the sync algorithm first do a discovery step to see what changes over the whole sync folder, then propagates files. As a result, it might take a very long time before even starting to download or upload the first file, and restarting the sync has to do the discovery step from scratch again. In order to solve the problem, we could do the propagation of downloading / uploading files, as long as we've discovered that folder.

How:

There might be different approach. But I guess the easier would be to merge the discovery's ProcessDirectoryJob and the PropagateDirectory into a single operation. I believe this might be the easier. They could be kept separate and still run in parallel, but then it might be harder to synchronize. Note that in any case, all the removes would still be done at the end. Currently, only the removal of directory are done at the end, but under the new approach, we would also have to perform the file removal at the end.

Problems:

Some features will stop working. Notably:

The progress indication: It will be impossible to give a good progress of the sync since we have no idea how much is to be done in total.

Non-problems

Something that will not be a problem is the detection of moves/rename. We do not need to have the whole sync tree in memory for that. The new discovery algorithm already does move detection quite well without that.

Alternative

We may choose not to do that at all and cache the result of the discovery in a new table: https://github.com/owncloud/client/issues/2976 This would solve the problem that restarting the first sync before the discovery is finished restart from zero. This will however not solve the memory usage problem which will be solved by the first approach, and it will still (i think) be slower that the paralelization.

mmattel commented 4 years ago

The progress indication

Is it possible to dynamically not only change the current value but also change the target value (means the 100% reference value)? If this is possible, then you have a full dynamic progressbar, feeded by two dynamic streams. The current processed items and the taget number which changes based on parallel running gathering. If described, this would not be confusing.

ogoffart commented 4 years ago

Is it possible to dynamically not only change the current value but also change the target value

Yes, that's possible, but that means that the progress might actually go backward. (Start at 0%, then reach 50% then back to 25% then 85% then 10% then 90% and finally 100%, as files are discovered and transferred)

mmattel commented 4 years ago

What about to write during syncing something like x of y items to process? Both values will be dynamically written.

michaelstingl commented 4 years ago

Yes, that's possible, but that means that the progress might actually go backward. (Start at 0%, then reach 50% then back to 25% then 85% then 10% then 90% and finally 100%, as files are discovered and transferred)

Worth a try maybe if users see that the total number of discovered files is also increasing.

ckamm commented 4 years ago

This looks sensible to me.

The progress indication may be a reason for keeping the discovery and propagation separate, even if propagation would now already start before discovery finishes. That way progress indication is possible as soon as discovery finishes, exactly like it is right now.

owncloud / client