orgzly-revived / orgzly-android-revived

Outliner for taking notes and managing to-do lists
https://www.orgzlyrevived.com
GNU General Public License v3.0
669 stars 42 forks source link

Sync one repo at a time #1

Open amberin opened 1 year ago

amberin commented 1 year ago

I have started to use multiple Git repos. It works well, but syncing is pretty slow, especially when there are multiple changed notebooks.

I haven't looked at the logic for the other repo types, but I suspect Git syncing could be made a lot faster if we would group notebooks by repo before syncing them. This would speed up Git syncing even when using only a single repo.

Today, sync means looping over all notebooks and syncing them one by one, in an order unrelated to their repo links. (I have moved git pushing out of this loop to save some time with git repos. Unfortunately, we still push once per changed notebook when there are only local changes, due to SyncRepo.storeBook being called both in this scenario and during force-saving. But that is a separate problem.)

Syncing notebooks grouped by repo would allow a much faster workflow for Git. We would probably not need to loop over each notebook, as Git already knows what has changed. And when there are changes, they could all go into a single commit. Having full control of the sync process of each repo would allow us to be more economical with fetch/push and possibly other time-consuming operations.

Looping over notebooks would obviously still be possible for all repo types; it would just be done one repo at a time.

Can anyone see challenges with this approach?

colonelpanic8 commented 1 year ago

Yeah, I mean in general, I think that the git sync flow is just really different than all the other types of sync flows.

amberin commented 1 year ago

I suppose what I am talking about is quite a major change, since it would mean that repos, not notebooks, becomes the main sync object. Perhaps it would require an unreasonable amount of re-writes. But it should be worth at least exploring, to make an assessment.

amberin commented 1 year ago

Yeah, I mean in general, I think that the git sync flow is just really different than all the other types of sync flows.

You are right.

Idea: Sync notebooks one-by-one same as now, but skipping those linked to Git repos. After that loop, add a separate method which loops over Git repos, syncing them with a different workflow. This way, we should only need to rewrite Git-related code.

Notebooks without links also need to be handled, of course.

colonelpanic8 commented 1 year ago

Isn't there already a separate interface/method for two way sync, which, at the moment, is only implemented by git sync anyway?

amberin commented 1 year ago

Yes. Although I'm not sure about the usefulness of that category. Is it evident that all "two way sync" repos should be synced per-repo, instead of per-notebook? Probably not. As you said, Git is just different from all the other types (so far).

colonelpanic8 commented 1 year ago

Yes. Although I'm not sure about the usefulness of that category. Is it evident that all "two way sync" repos should be synced per-repo, instead of per-notebook? Probably not. As you said, Git is just different from all the other types (so far).

Sure, my point is that at this point that interface was created solely to support the git use case. If our requirements are slightly different, I think its fine to simply change the interface as we need be, before there are other consumers.

Or perhaps create an entirely new one. The point is that I think that we agree that we should think about what we need in a syncing interface from first principles without thinking about how orgzly currently does things.

amberin commented 9 months ago

I have experimented with this a bit. I now have a branch with an IntegrallySyncedRepo interface, which allows syncing a repo as a whole. So far, I have achieved the following:

Syncing is significantly faster, especially when there are remote changes or changes in multiple notebooks. I have learned that fetch and push actions are what takes the most time, which is why I implemented AutoCloseable for the SSH transport. Fetch now typically takes me around 2 seconds (of which most is setting up the SSH session), but the subsequent push typically takes less than 0,5 seconds.

I still need to solve the following (but don't see any difficulties):

chaoflow commented 5 months ago

@amberin This sounds great and I agree there should be only one commit for each sync. Is the added complexity of reloading only specific notebooks worth it or should we just reload all?

To further simplify, what do you think about the following?

A. Export and commit

  1. Export all notebooks
  2. Per repository commit everything on local branch - now everything is in git.
  3. Try to push, no force, even if no changes (needed to recover from conflict, see below)
    • SUCCESS: done
    • FAILURE: continue with B

B. Rebase and reload

  1. Fetch remote branch
  2. Rebase local onto remote
    • SUCCESS: try to push, no force
      • SUCCESS: reload all notebooks within repository
      • FAILURE: restart with B.1
    • FAILURE:
      • force push local branch to configurable remote conflict branch
      • notify in orgzly-UI: user resolves in git, updates remote branch, and syncs again
      • done

This workflow should succeed, except if the remote branch gets rewritten. In that case we need a force-load that is manually triggered in orgzly:

C. Force load

  1. Fetch remote branch
  2. Reset local branch to remote branch
  3. Reload all notebooks from local branch, discarding any local changes
amberin commented 5 months ago

@chaoflow Many thanks for your input! Just a few quick thoughts:

chaoflow commented 5 months ago

@amberin I think the approach I sketched achieves what you describe in your third point: in the case of a conflict, the local branch automatically turns into a conflict branch, without the need of having an extra branch for that locally, and for every sync a rebase is attempted.

The goals of the sketched approach are:

  1. Push changes to remote main branch, if possible.
  2. Push changes to remote conflict branch, otherwise.
  3. Automatically recover, once user has resolved the conflict, i.e. rebasing to remote main is possible again.
  4. Allow force-loading to discard any local state -- for force-saving there should be no need.
  5. Produce linear history, i.e. do not enforce merges on the user.
  6. Enable user to use merges to resolve a conflict or any other strategy.

An option is to reset the local main branch to the remote main branch before exporting. In case of conflict, orgzly would keep rewriting and force pushing its commit to the remote conflict branch, which might simplify recovery.