langfield / ki

version control for Anki collections
https://langfield.github.io/ki/
GNU Affero General Public License v3.0
70 stars 3 forks source link

Converting a deck to a submodule #128

Closed abbradar closed 1 year ago

abbradar commented 1 year ago

Hi again!

I want to convert my (currently single) deck to a submodule which I can share. Currently I follow these general steps:

  1. ki clone;
  2. cd collection/my-deck; mv ../_media .; cp ../models.json;
  3. Edit the models.json file in the deck to only include necessary note types;
  4. git init; git commit; git push;
  5. cd ..; rm -rf my-deck; git submodule add git@host:my-deck;
  6. ki push.

So far so good, I get this output:

Pushing to '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'
========================
Note change types
------------------------
ADD                    0
DELETE                 0
MODIFY                 0
RENAME                 0
TYPE CHANGE            0
========================
Overwrote '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'

However, when I edit anything in Anki and try to pull, this happens:

Updating 5c09826..d86e55e
Fast-forward
 .gitmodules             |   3 ---
 .ki/hashes              |   3 +--
 Thai                    |   1 -
 _media/_mp3_E131073.mp3 | Bin 0 -> 12438 bytes
 ...
  378 files changed, 1 insertion(+), 6 deletions(-)
 delete mode 100644 .gitmodules
 delete mode 160000 Thai
 create mode 100644 _media/_mp3_E131073.mp3
 ...

From /tmp/tmpbcy5wh05/ki-local-d299c43c6d3e87ea387dc28c57d5dd5e/
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> anki/main
warning: unable to rmdir 'Thai': Directory not empty

As far as I understand, Ki tries to remove the submodule and revert me to the previous repo state.

I'm obviously doing something the wrong way, but I didn't find a guide to convert decks like that. What am I missing here?

P.S: I really appreciate this piece of software, wanted to implement something similar before. Kudos!

langfield commented 1 year ago

Hey! Happy to help. You seem quite technically competent, so I suggest looking at the submodule.py script and seeing if you can figure it out from looking at the help text. If not, I'd be happy to walk you through the process step-by-step. It's high time I wrote a guide for this anyway. There may be an explanation somewhere in the issues as well, I'll try to dig it up if it exists.


From: Nikolay Amiantov @.> Sent: Thursday, April 27, 2023 2:33:53 PM To: langfield/ki @.> Cc: Subscribed @.***> Subject: [langfield/ki] Converting a deck to a submodule (Issue #128)

Hi again! I want to convert my (currently single) deck to a submodule which I can share. Currently I follow these general steps: ki clone; cd collection/my-deck; mv .. /_media .; cp .. /models. json; Edit the models. json file in the deck to only

Hi again!

I want to convert my (currently single) deck to a submodule which I can share. Currently I follow these general steps:

  1. ki clone;
  2. cd collection/my-deck; mv ../_media .; cp ../models.json;
  3. Edit the models.json file in the deck to only include necessary note types;
  4. git init; git commit; git push;
  5. cd ..; rm -rf my-deck; git submodule add @.***:my-deck;
  6. ki push.

So far so good, I get this output:

Pushing to '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'

Note change types

ADD 0 DELETE 0 MODIFY 0 RENAME 0 TYPE CHANGE 0

Overwrote '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'

However, when I edit anything in Anki and try to pull, this happens:

Updating 5c09826..d86e55e Fast-forward .gitmodules | 3 --- .ki/hashes | 3 +-- Thai | 1 - _media/_mp3_E131073.mp3 | Bin 0 -> 12438 bytes ... 378 files changed, 1 insertion(+), 6 deletions(-) delete mode 100644 .gitmodules delete mode 160000 Thai create mode 100644 _media/_mp3_E131073.mp3 ...

From /tmp/tmpbcy5wh05/ki-local-d299c43c6d3e87ea387dc28c57d5dd5e/

As far as I understand, Ki tries to remove the submodule and revert me to the previous repo state.

I'm obviously doing something the wrong way, but I didn't find a guide to convert decks like that. What am I missing here?

P.S: I really appreciate this piece of software, wanted to implement something similar before. Kudos!

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/langfield/ki/issues/128__;!!KGKeukY!zXcWrz9yg1VRbD6b5Lj7Lrm0nz8rJJF7sg-9yq3ktBRSwQ-Q08skK654qvBqWue0p68wL0ZJEz4OU1rCM4m5kG1oel-JHw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AISQNI4B3ZG3DJDWP34355TXDK3ZDANCNFSM6AAAAAAXOHTUKM__;!!KGKeukY!zXcWrz9yg1VRbD6b5Lj7Lrm0nz8rJJF7sg-9yq3ktBRSwQ-Q08skK654qvBqWue0p68wL0ZJEz4OU1rCM4m5kG1RZiImkA$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

langfield commented 1 year ago

I'm super glad you find it useful! You're the only user I've had who knows Haskell, so let me know if you'd be interested in contributing.


From: Nikolay Amiantov @.> Sent: Thursday, April 27, 2023 2:33:53 PM To: langfield/ki @.> Cc: Subscribed @.***> Subject: [langfield/ki] Converting a deck to a submodule (Issue #128)

Hi again! I want to convert my (currently single) deck to a submodule which I can share. Currently I follow these general steps: ki clone; cd collection/my-deck; mv .. /_media .; cp .. /models. json; Edit the models. json file in the deck to only

Hi again!

I want to convert my (currently single) deck to a submodule which I can share. Currently I follow these general steps:

  1. ki clone;
  2. cd collection/my-deck; mv ../_media .; cp ../models.json;
  3. Edit the models.json file in the deck to only include necessary note types;
  4. git init; git commit; git push;
  5. cd ..; rm -rf my-deck; git submodule add @.***:my-deck;
  6. ki push.

So far so good, I get this output:

Pushing to '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'

Note change types

ADD 0 DELETE 0 MODIFY 0 RENAME 0 TYPE CHANGE 0

Overwrote '/home/abbradar/.local/share/Anki2/User 1/collection.anki2'

However, when I edit anything in Anki and try to pull, this happens:

Updating 5c09826..d86e55e Fast-forward .gitmodules | 3 --- .ki/hashes | 3 +-- Thai | 1 - _media/_mp3_E131073.mp3 | Bin 0 -> 12438 bytes ... 378 files changed, 1 insertion(+), 6 deletions(-) delete mode 100644 .gitmodules delete mode 160000 Thai create mode 100644 _media/_mp3_E131073.mp3 ...

From /tmp/tmpbcy5wh05/ki-local-d299c43c6d3e87ea387dc28c57d5dd5e/

As far as I understand, Ki tries to remove the submodule and revert me to the previous repo state.

I'm obviously doing something the wrong way, but I didn't find a guide to convert decks like that. What am I missing here?

P.S: I really appreciate this piece of software, wanted to implement something similar before. Kudos!

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/langfield/ki/issues/128__;!!KGKeukY!zXcWrz9yg1VRbD6b5Lj7Lrm0nz8rJJF7sg-9yq3ktBRSwQ-Q08skK654qvBqWue0p68wL0ZJEz4OU1rCM4m5kG1oel-JHw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AISQNI4B3ZG3DJDWP34355TXDK3ZDANCNFSM6AAAAAAXOHTUKM__;!!KGKeukY!zXcWrz9yg1VRbD6b5Lj7Lrm0nz8rJJF7sg-9yq3ktBRSwQ-Q08skK654qvBqWue0p68wL0ZJEz4OU1rCM4m5kG1RZiImkA$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

langfield commented 1 year ago

So here's how you'd do this.

  1. Clone this repository, put submodule.py somewhere you can find it.

  2. Create a new Github repository for your submodule, perhaps git@github.com:abbradar/mydeck.git.

  3. Let's assume your cloned collection is in your home directory, and that's where you put submodule.py as well. Then run python3 submodule.py --kirepo collection/ --deck my-deck --remote git@github.com:abbradar/mydeck.git.

That should do it.

langfield commented 1 year ago

This is what the output will look like:

(anki) mal@computer:~$ python3 ki/submodule.py --kirepo collection/ --deck Decks/Programming --remote git@github.com:langfield/programmingcards.git
Operating on '/tmp/tmp5szfsbk_/submodule-850d87b1af8898e69fa05ab5b88ffa4d109e45ca'
Parsed 1 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 591fa33 Initial commit
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (37/37), done.
Writing objects: 100% (37/37), done.
Total 37 (delta 8), reused 0 (delta 0), pack-reused 0
Completely finished after 0.13 seconds.
branch 'main' set up to track 'origin/main'.
Committed submodule at rev '351b1ea58e25ef24921ce4ed7146895b0a150fca'
abbradar commented 1 year ago

Wow, I didn't really expect you to answer that thoroughly -- thank you! re contribution: I would love to, and one in the industry needs to write in Haskell periodically to stay sane, but I'm overloaded now \~_\~ Maybe in several months I have more free time to dedicate to projects!

Before I continue, disclosure: I added Poetry files to my fork and managed to build Ki with almost identical dependencies, but for these three: anki, click and colorama. This is due to NixOS shenanigans with packages including pre-built binaries, so it's much easier for me to use versions from NixOS. I don't think this could have affected the story, but you may beg to differ.

My steps are now like that:

  1. ki clone;
  2. submodule.py;
  3. git checkout main in the submodule (seems a bug when the default subrepo branch was "master", should be easy to fix in the script);
  4. git commit in the collection repo;
  5. Change something in the shared deck in Anki;
  6. ki pull

I see this:

Cloning into '/tmp/tmp6pkqgkw5/ki-remote/ee6ed75a62e1640fe0f368234ee59a9a'...
Notes: 100%|███████████████████████████████| 484/484 [00:00<00:00, 16023.33it/s]
Media: 100%|████████████████████████████████| 484/484 [00:00<00:00, 8765.37it/s]
Fields: 100%|██████████████████████████████| 484/484 [00:00<00:00, 11659.41it/s]
Decks: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 13.62it/s]

From /tmp/tmpnrilinre/ki-local-ee6ed75a62e1640fe0f368234ee59a9a/
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> anki/main
error: The following untracked working tree files would be overwritten by merge:
    Thai/baang-kráng.md
Please move or remove them before you merge.
Aborting
Merge with strategy ort failed.

(that's the note that I changed in Anki)

Any ideas? Can you reproduce this? I'll try it with a new collection later.

SimonSelg commented 1 year ago

Hi everyone, I'm having the same/similar problems when I tried out this tool a few days ago. Very happy to hear that @langfield is active here and willing to help, I greatly appreciate that. I'll re-run things on my side and provide the error output.

I'm not good with Haskel, but I can debug and write python, so I'm positive I can contribute here.

SimonSelg commented 1 year ago

I've ran everything as mentioned here:

So here's how you'd do this.

  1. Clone this repository, put submodule.py somewhere you can find it.
  2. Create a new Github repository for your submodule, perhaps git@github.com:abbradar/mydeck.git.
  3. Let's assume your cloned collection is in your home directory, and that's where you put submodule.py as well. Then run python3 submodule.py --kirepo collection/ --deck my-deck --remote git@github.com:abbradar/mydeck.git.

That should do it.

I've created a desk called programming, ran the submodule script on it. Worked fine, after I added at least one file to the _media folder (otherwise the git rm call inside the submodule.py script does not remove all folders, and the script subsequently fails). The output matches the expected output provided above.

However, when I then make a change in anki to the desk (just add a card), save it, and try to ki pull, I get the following error

$ ki pull
Cloning into '/tmp/tmpv3suffag/ki-remote/98fb65a54d46b984210ab59a4ca3c1f7'...
Notes: 100%|█████████████████████████████████| 26/26 [00:00<00:00, 13333.16it/s]
Media: 100%|██████████████████████████████████| 26/26 [00:00<00:00, 8435.98it/s]
Fields: 100%|████████████████████████████████| 26/26 [00:00<00:00, 20649.86it/s]
Decks: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 1416.68it/s]
CONFLICT (file/directory): directory in the way of programming from HEAD; moving it to programming~HEAD instead.
Automatic merge failed; fix conflicts and then commit the result.

From /tmp/tmpuejfet3z/ki-local-98fb65a54d46b984210ab59a4ca3c1f7/
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> anki/main
warning: unable to rmdir 'programming': Directory not empty

@langfield, any suggestions how to proceed here? I'll try to read through the source in the meantime.

langfield commented 1 year ago

@abbradar I will try and reproduce with my own data, but if you have a dummy collection you can share for which you're sure the error pops up, that would be helpful!

Also, having the correct anki version may be quite important for reproducibility. I don't believe they've made any breaking changes to the API lately, but I could be wrong.

The ability to push and pull from remotes in submodules is kinda the whole point of the project, so you can rest assured I will get this fixed for you guys.

langfield commented 1 year ago

otherwise the git rm call inside the submodule.py script does not remove all folders, and the script subsequently fails

@SimonSelg Nice catch, that's a bug, certainly!

SimonSelg commented 1 year ago

@langfield do you want me to to file a PR for that?

langfield commented 1 year ago

@SimonSelg Not sure if you're asking me to write the PR, or asking if I'll merge it if you do, but either is good with me!

SimonSelg commented 1 year ago

Cool, then I'll write the PR for that.

I've added a sample collection https://github.com/SimonSelg/anki-test-collection, which you can hopefully use to reproduce the issue.

langfield commented 1 year ago

Using this issue as a record of any exceptions I run into while debugging this.

(anki) mal@computer:~$ python3 ki/submodule.py --kirepo collection/ --deck Programming --remote git@github.com:langfield/submodule-pull-test.git
Operating on '/tmp/tmp3nkapb49/submodule-d63bf40f4beaf8a5275b30197ad20cd45e11eafb'
Parsed 1 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at b3075af Initial commit
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
Completely finished after 0.09 seconds.
branch 'main' set up to track 'origin/main'.
Traceback (most recent call last):
  File "/home/mal/ki/submodule.py", line 109, in <module>
    main()
  File "<@beartype(__main__.main) at 0x7f56caebd0d0>", line 10, in main
  File "/home/mal/ki/submodule.py", line 98, in main
    repo.git.submodule("add", args.remote, args.deck)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/cmd.py", line 696, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/cmd.py", line 1270, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/cmd.py", line 1064, in execute
    raise GitCommandError(redacted_command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git submodule add git@github.com:langfield/submodule-pull-test.git Programming
  stderr: 'fatal: 'Programming' already exists and is not a valid git repo'
langfield commented 1 year ago

Okay I believe I've reproduced the error.

(anki) mal@computer:~/collection$ ki pull
Cloning into '/tmp/tmpnusdn81s/ki-remote/2846dee376f7a436e9ec586a9b73918c'...
Notes: 100%|████████████████████████████████████| 3/3 [00:00<00:00, 2821.28it/s]
Media: 100%|████████████████████████████████████| 3/3 [00:00<00:00, 2550.25it/s]
Fields: 100%|███████████████████████████████████| 3/3 [00:00<00:00, 2939.93it/s]
Decks: 100%|█████████████████████████████████████| 2/2 [00:00<00:00, 675.47it/s]
CONFLICT (file/directory): directory in the way of Programming from HEAD; moving it to Programming~HEAD instead.
Automatic merge failed; fix conflicts and then commit the result.

From /tmp/tmp0tcbp53h/ki-local-2846dee376f7a436e9ec586a9b73918c/
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> anki/main
warning: unable to rmdir 'Programming': Directory not empty

Traceback (most recent call last):
  File "/home/mal/conda/envs/anki/bin/ki", line 33, in <module>
    sys.exit(load_entry_point('ki', 'console_scripts', 'ki')())
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "<@beartype(ki.pull) at 0x7fcd50119d30>", line 10, in pull
  File "/home/mal/ki/ki/__init__.py", line 1693, in pull
    _pull(kirepo)
  File "<@beartype(ki._pull) at 0x7fcd50119c10>", line 31, in _pull
  File "/home/mal/ki/ki/__init__.py", line 1828, in _pull
    commit_hashes_file(kirepo)
  File "<@beartype(ki.commit_hashes_file) at 0x7fcd50119790>", line 31, in commit_hashes_file
  File "/home/mal/ki/ki/__init__.py", line 1537, in commit_hashes_file
    kirepo.repo.index.commit("Update collection hashes file.")
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/index/base.py", line 1055, in commit
    tree = self.write_tree()
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/index/base.py", line 580, in write_tree
    binsha, tree_items = write_tree_from_cache(entries, mdb, slice(0, len(entries)))
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/git/index/fun.py", line 302, in write_tree_from_cache
    raise UnmergedEntriesError(entry)
git.exc.UnmergedEntriesError: 160000 bf0c609e6400280bb74d44c0b8f7a882f0e7588e 2 Programming~HEAD
langfield commented 1 year ago

This SO answer looks promising.

For 1-2 days I've been wondering what's the problem. Here is a link with the most approximate (and really awful) answer I've found so far:

Checkout branch with different submodules is not working

And for a quick answer:

Supposing, for example, branch master has submodule A and branch alpha has submodule B.

To properly checkout master you should, essentially
git submodule deinit .
git checkout master
git submodule update --init --recursive

To go back to alpha, again
git submodule deinit .
git checkout alpha
git submodule update --init --recursive
langfield commented 1 year ago

That didn't work, but this one might.

I was able to figure this out. It's not trivial but so far it's working. The key element is that prior to switching from one branch to another I stash the .git/modules folder somewhere based on the name of the current branch. This way when I switch back to that branch, I can restore the stashed modules since that stores the git repository information for all of the active submodules on that branch.

Roughly the process of going from any branch to any other branch is as follows:

export TARGET_BRANCH="my-branch-name"
export CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
if [ -f ".gitmodules" ]; then
  git submodule deinit --all
  mkdir -p .git/git-tools/modules
  mv .git/modules .git/git-tools/modules/$CURRENT_BRANCH
fi

git checkout $TARGET_BRANCH

if [ -f ".gitmodules" ]; then
  if [ -f ".git/git-tools/modules/$TARGET_BRANCH" ]; then
    git mv .git/git-tools/modules/$TARGET_BRANCH .git/modules
  fi

  git submodule sync && git submodule update --init
fi
langfield commented 1 year ago

If you guys want to play around with it, I suspect the thing to try is to replicate the logic of this bash script around the git_pull() call here:

https://github.com/langfield/ki/blob/68919e08ba2f13110766daef7129cb5aa5c71ed9/ki/__init__.py#L1810-L1816

SimonSelg commented 1 year ago

Thanks for looking into this! I will check out the bash script this evening and report my findings here.

langfield commented 1 year ago

So the first thing to note here is that this is purely a git problem. There is nothing about the specific use case which is relevant, in my opinion. We have a branch without any submodules and we wish to merge it into a branch with a new submodule, and there are changes to files within the directory that was converted into a submodule.

Here is a WIP script to reproduce the state of things before we ki pull.

#!/usr/bin/env bash
mkdir repo
mkdir repo/a/
mkdir repo/b/
echo 'a' > repo/a/a.txt
echo 'b' > repo/b/b.txt
cd repo
git init
git add .
git commit -m "Initial commit"
git checkout -b remote
echo 'c' > b/c.txt
echo 'b' >> b/b.txt
git add .
git commit -m "Changes"
git checkout master
cd ..
git clone repo copy
cd copy
git filter-repo --subdirectory-filter b
cd ..
cd repo
git rm -rf b
git add .
git commit -m "Remove b"
git submodule add ../copy b
git commit -m "Add b as submodule"
langfield commented 1 year ago

So I have an idea but it's a bit gross. This issue only arises when we have created a new submodule in repository and we want to pull changes from Anki. I believe the solution is to detect submodules that exist in the current state of the repository, but do not exist in the (LCA) last common ancestor revision.

For each new submodule, we turn the corresponding directory in the LCA repository into a submodule after the Anki remote has been merged in. Then we merge into the main repo and we should be fine. We'll get conflicts, but this is expected behavior. The user can resolve them from within the submodules.

langfield commented 1 year ago

Okay, well here is an authoritative answer.

If there are changes pertinent to the submodule, I'd like them to become conflicts within the submodule, and not the parent repo.

Considering one branch does not have the submodule, while the other does have a submodule, it is best to follow the error advice and git rm b first in remote, before trying the git merge remote.

Plus, do not forget that branches inside a submodule are distinct from branches inside a parent repository.

Finally:

echo 'd' > b/d.txt
cd b
git add .
git commit -m "Add new file in submodule b"
cd ..
git add .
git commit -m "Add submodule commit"

When you make any change to a submodule, and register the new tree in the parent repository, do not forget to push from the submodule to its own upstream repository. Or your parent repository would reflect a submodule repository main tree which ... nobody else will be able to clone and replacate/checkout.

Basically, we're supposed to manually copy the stuff in b from the remote into the submodule. This is much simpler than the thing I proposed above! 😀

SimonSelg commented 1 year ago

Sounds good, I'm a bit concerned about the git --work-tree=/path/to/outside/save_folder add .: this basically copies over the files from the /path/to/outside/save_folder into the repo. That means no merge conflict handling, just overwriting.

I know that the LCA concept usually solves this problem (copy remote into LCA, then merge LCA into repo to handle merge conflicts) . But in our scenario, the LCA does not have a submodule, but the repo has one - so copy needs to happen from LCA to repo, hence no merge conflict handling there). Am I missing something here?

On the other hand, one could just set the last-successful-ki-push tag after converting a desk to a submodules . Then the LCA would already have submodules and the scenario described here could be avoided.


If I understand things right, one could still use the strategy described in the so post to improve ki: use it to to handle the "anki remote has no submodule, but the LCA has one" case - in that case overwriting does not cause merge conflicts, since it's the LCA.

Then, to merge the submodule from the LCA back to the repo, one could add the submodule from the LCA as a remote to the submodule from the repo, and perform a regular merge there.


It's a bit late here and I'm tired, so excuse me if I am understanding something wrong here completely. In that case, ignore my comment.

SimonSelg commented 1 year ago

So, that'd result in the following strategy (probably missing some thing here, hope it still helps):

Precondition: If we have submodules, the LCA already contains them (so that needs to be ensured when converting to a submodule and when adding an existing one).

I'm assuming the lca repo has already been created and resetted properly (including submodules).

Step 1: add changes from anki to LCA repo

Step 2: merge LCA into repo

That'd way, one could fully rely on git without copying over patches between repos.

langfield commented 1 year ago

Sounds good, I'm a bit concerned about the git --work-tree=/path/to/outside/save_folder add .: this basically copies over the files from the /path/to/outside/save_folder into the repo. That means no merge conflict handling, just overwriting.

I know that the LCA concept usually solves this problem (copy remote into LCA, then merge LCA into repo to handle merge conflicts) . But in our scenario, the LCA does not have a submodule, but the repo has one - so copy needs to happen from LCA to repo, hence no merge conflict handling there). Am I missing something here?

Your concern is warranted, I think. To me, it is slightly unclear what VonC is suggesting. Is he suggesting we dirty copy the files from the submodule-less copy of b/ to the submodule b in the branch we're merging into?

I'm actually not sure how --work-tree works. I suppose it might be "safer" than I'm imagining. I think the best thing to do is to just try it in a bash script and see what happens!

On the other hand, one could just set the last-successful-ki-push tag after converting a desk to a submodules . Then the LCA would already have submodules and the scenario described here could be avoided.

This will cause problems, I think. Messing with the LCA tag (and in particular, moving it forward) will mean the merge of the Anki remote into the LCA repo will no longer be a fast-forward, which is very bad.

langfield commented 1 year ago

If I understand things right, one could still use the strategy described in the so post to improve ki: use it to to handle the "anki remote has no submodule, but the LCA has one" case - in that case overwriting does not cause merge conflicts, since it's the LCA.

Then, to merge the submodule from the LCA back to the repo, one could add the submodule from the LCA as a remote to the submodule from the repo, and perform a regular merge there.

I have read over your proposed procedure and it looks promising! The current implementation is very hacky and hard to understand, as you might have noticed. In particular, it dirty-copies the .git/ folder in some places, which is a sketchy thing to be doing.

However, let's fix one thing at a time. Here are the current obstacles in the way of getting submodule remotes working properly:

langfield commented 1 year ago

@SimonSelg Also, I'd just like to say I'm extremely impressed with how fast you've managed to get a maintainer's understanding of this system! 🤯 I really hope you stick around if you have the free time to continue contributing. With two people I think we'd be able to get this into a useful, stable state quite quickly.

SimonSelg commented 1 year ago

This will cause problems, I think. Messing with the LCA tag (and in particular, moving it forward) will mean the merge of the Anki remote into the LCA repo will no longer be a fast-forward, which is very bad.

Why would the merge not be a fast-forwards in that case? The files are still the same - submodule or not. Probably, I've missed something.


Another way to think about it: do a "ki push" after converting a desk to a submodule (without the bailout due to identical hash). Then, the LCA tag would be on the HEAD of the repo, and the LCA already has a submodule. Edge case avoided.

Same when adding a new desk as submodule - one does a ki push afterwards to get the content into anki. Aftwarwards, the LCA already contains a submodule.

SimonSelg commented 1 year ago

Your concern is warranted, I think. To me, it is slightly unclear what VonC is suggesting. Is he suggesting we dirty copy the files from the submodule-less copy of b/ to the submodule b in the branch we're merging into?

Yes, I think that is what he is suggesting. To git, a submodule is a different repo (with the submodule's HEAD commit id attached as reference to the main repo) - so there is no way to resolve merge conflicts between a submodule and a subfolder. One can only copy it over.

I'm actually not sure how --work-tree works. I suppose it might be "safer" than I'm imagining. I think the best thing to do is to just try it in a bash script and see what happens!

It's basically tells git to perform the "diff" on which the staged changes are based not between the repos HEAD and the current filesystem, but between the repos HEAD and the folder supplied in the argument. So it allows adding changes, without copying over the files.

SimonSelg commented 1 year ago

@SimonSelg Also, I'd just like to say I'm extremely impressed with how fast you've managed to get a maintainer's understanding of this system! :exploding_head: I really hope you stick around if you have the free time to continue contributing. With two people I think we'd be able to get this into a useful, stable state quite quickly.

Thanks for the kudos :) Yup, now that I have a rough understanding of how the project works (after reading through the code for a couple of hours it now makes sense), I'd love to stick around for a bit. I like the use-case of the project and the idea behind it (the LCA concept is nice!) and yeah, I'm motivated to help get this working for me and for other people.

Are there any concrete tasks where I could help?

SimonSelg commented 1 year ago

I've updated my proposed algorithm above with my newly revised understanding of git --work-tree. This is nice, it saves quite some work if I understand it correctly. Looking forward to your feedback @langfield

langfield commented 1 year ago

Why would the merge not be a fast-forwards in that case? The files are still the same - submodule or not. Probably, I've missed something.

Suppose you ki pull, then you make many commits in your repository, and then you git submodule add something. If we shift the LCA tag forwards to the point at which the submodule was created, it will wreak havoc, I believe.

Another way to think about it: do a "ki push" after converting a desk to a submodule (without the bailout due to identical hash). Then, the LCA tag would be on the HEAD of the repo, and the LCA already has a submodule. Edge case avoided.

But just as with ordinary git pushes and pulls, it doesn't make sense to push until you've merged in the changes from the branch you're pushing to. In the case of ki, this means running ki pull.

langfield commented 1 year ago

It's basically tells git to perform the "diff" on which the staged changes are based not between the repos HEAD and the current filesystem, but between the repos HEAD and the folder supplied in the argument. So it allows adding changes, without copying over the files.

That is extremely interesting 🤔

langfield commented 1 year ago

Are there any concrete tasks where I could help?

Absolutely. I will attempt to do things in the order of the checklist above.

~Over the next few days, I will write up a detailed explanation of the issue I'm having with the grammar. This is very easy-to-understand stuff in comparison to the submodule subtleties we've been discussing, so I'm sure you'll have no problem with it! Basically I just need a second opinion on how best to parse this Markdown format.~

~It needs a very small tweak to allow leading newlines after note field headers, and then I'll need to write some better error messages, now that I've figured out it's possible with Lark.~

The parser stuff has been finished.

~In parallel, I'll rebase/merge your PR!~ Done!

langfield commented 1 year ago

I've updated my proposed algorithm above with my newly revised understanding of git --work-tree. This is nice, it saves quite some work if I understand it correctly. Looking forward to your feedback @langfield

It looks excellent! I cannot say more without actually writing an implementation and seeing if it works, haha.

langfield commented 1 year ago

Alternatively, if you are less interested in the grammar stuff, you could jump right into implementing this:

Handle the case where we are merging the LCA repo into the root, and we've added a new submodule in the root

You'll need to add logic here:

https://github.com/langfield/ki/blob/fed9977fac349eedc0268427d0be7b979fd49651/ki/__init__.py#L1810-L1816

that implements the procedure suggested by VonC, using git --work-tree add. If I understand correctly, this is distinct from the algorithm you wrote, which treats the case where the LCA already has submodules. Here, we are treating the case where the LCA is missing some submodule that has been added to the main repository sometime after the LCA revision.

I suggest prototyping everything on dummy git repos in bash scripts before writing any GitPython.

abbradar commented 1 year ago

Just wanted to update that for now I can't spare much time researching that issue after all, so for now I work around it by avoiding the submodules. Thank you so much for researching that!

langfield commented 1 year ago

Progress has been made!

https://github.com/langfield/ki/blob/39bbd355bd91a5ac636fdeb7fc55174f85dcf8cd/repro-submodule-conflict.sh#L45-L61

The only issue is that we're not doing an actual merge in the submodule yet, which is a problem.

langfield commented 1 year ago

A working version of the merge procedure is available here.

Now what remains to be done is to translate it into python, and work out any bugs that might pop up. 😁

langfield commented 1 year ago

@SimonSelg Hey, still interested in this?

SimonSelg commented 1 year ago

Hi @langfield, yes I am. I got too busy with my job and my studies to reply here, apologies. I will check out the new procedure this evening.

langfield commented 1 year ago

@SimonSelg Wonderful! This is the relevant branch. I've written a test for the issue originally described above, and it fails as expected (CI passes because it's an xfail).

Next step is just to implement it, as mentioned above. It should really be quite mechanical.

langfield commented 1 year ago

@SimonSelg @abbradar I've written a working implementation of this which passes the test! What remains to be done is to make sure the PR passes CI, and then it will be merged.

langfield commented 1 year ago

So I was going to use your procedure to handle the non-new submodule case, but it seems you've assumed that there are submodules in the remote, which is never the case because it is a fresh clone from the SQLite3 database. If you revise your procedure and open a new issue, I'm happy to try implementing it, if it will be shorter!