Open amberin opened 1 year ago
Yeah, I mean in general, I think that the git sync flow is just really different than all the other types of sync flows.
I suppose what I am talking about is quite a major change, since it would mean that repos, not notebooks, becomes the main sync object. Perhaps it would require an unreasonable amount of re-writes. But it should be worth at least exploring, to make an assessment.
Yeah, I mean in general, I think that the git sync flow is just really different than all the other types of sync flows.
You are right.
Idea: Sync notebooks one-by-one same as now, but skipping those linked to Git repos. After that loop, add a separate method which loops over Git repos, syncing them with a different workflow. This way, we should only need to rewrite Git-related code.
Notebooks without links also need to be handled, of course.
Isn't there already a separate interface/method for two way sync, which, at the moment, is only implemented by git sync anyway?
Yes. Although I'm not sure about the usefulness of that category. Is it evident that all "two way sync" repos should be synced per-repo, instead of per-notebook? Probably not. As you said, Git is just different from all the other types (so far).
Yes. Although I'm not sure about the usefulness of that category. Is it evident that all "two way sync" repos should be synced per-repo, instead of per-notebook? Probably not. As you said, Git is just different from all the other types (so far).
Sure, my point is that at this point that interface was created solely to support the git use case. If our requirements are slightly different, I think its fine to simply change the interface as we need be, before there are other consumers.
Or perhaps create an entirely new one. The point is that I think that we agree that we should think about what we need in a syncing interface from first principles without thinking about how orgzly currently does things.
I have experimented with this a bit. I now have a branch with an IntegrallySyncedRepo interface, which allows syncing a repo as a whole. So far, I have achieved the following:
Syncing is significantly faster, especially when there are remote changes or changes in multiple notebooks. I have learned that fetch and push actions are what takes the most time, which is why I implemented AutoCloseable for the SSH transport. Fetch now typically takes me around 2 seconds (of which most is setting up the SSH session), but the subsequent push typically takes less than 0,5 seconds.
I still need to solve the following (but don't see any difficulties):
@amberin This sounds great and I agree there should be only one commit for each sync. Is the added complexity of reloading only specific notebooks worth it or should we just reload all?
To further simplify, what do you think about the following?
A. Export and commit
B. Rebase and reload
This workflow should succeed, except if the remote branch gets rewritten. In that case we need a force-load that is manually triggered in orgzly:
C. Force load
@chaoflow Many thanks for your input! Just a few quick thoughts:
@amberin I think the approach I sketched achieves what you describe in your third point: in the case of a conflict, the local branch automatically turns into a conflict branch, without the need of having an extra branch for that locally, and for every sync a rebase is attempted.
The goals of the sketched approach are:
An option is to reset the local main branch to the remote main branch before exporting. In case of conflict, orgzly would keep rewriting and force pushing its commit to the remote conflict branch, which might simplify recovery.
I have started to use multiple Git repos. It works well, but syncing is pretty slow, especially when there are multiple changed notebooks.
I haven't looked at the logic for the other repo types, but I suspect Git syncing could be made a lot faster if we would group notebooks by repo before syncing them. This would speed up Git syncing even when using only a single repo.
Today, sync means looping over all notebooks and syncing them one by one, in an order unrelated to their repo links. (I have moved git pushing out of this loop to save some time with git repos. Unfortunately, we still push once per changed notebook when there are only local changes, due to SyncRepo.storeBook being called both in this scenario and during force-saving. But that is a separate problem.)
Syncing notebooks grouped by repo would allow a much faster workflow for Git. We would probably not need to loop over each notebook, as Git already knows what has changed. And when there are changes, they could all go into a single commit. Having full control of the sync process of each repo would allow us to be more economical with fetch/push and possibly other time-consuming operations.
Looping over notebooks would obviously still be possible for all repo types; it would just be done one repo at a time.
Can anyone see challenges with this approach?