remiberthoz / anki-periodic-table-memory-pegs

Periodic Table flashcard deck for Ankl
https://ankiweb.net/shared/info/490209917
MIT License
51 stars 5 forks source link

Consider using `markdown` for note source files to achieve smaller diffs? #138

Open langfield opened 2 years ago

langfield commented 2 years ago

Disclaimer. This is a shameless plug for my own tool.

Hi! This is an extremely neat project! I love that you guys have been using git to handle contributions for so long! I've been building a tool to make version control of Anki decks easier, and I was wondering if you folks would be interested in trying it out.

Here is what the markdown note format looks like:

## Note
nid: 47
model: PeriodicTable-d75c0
tags: group:11, period:5
markdown: false

### Picture
<IMG SRC="47silver.gif">

### Name
Silver

### Number
47

### Symbol
Ag

### Memory sentence
<p><strong>Ag</strong>nes's Silver Dime. '47 was <strong>A
G</strong>ood year after WWII. (Silver is "argentum" in Latin).

### SpecialLocation

(Media is supported of course, and tracked by git as well). Let me know if you're interested! I desperately need users to figure out what can be improved, and maintainers of mature decks are exactly the folks I have in mind!

remiberthoz commented 2 years ago

Hi! Lovely idea! And thank you for reaching here. I see the value in having a standard way to share collaborative decks.

A questions that comes to my mind is, how do you take care of note guids? I'm asking because I struggled in the past (see this issue: #1), and glancing at your doc I see no way of managing them. Taking the Silver example you proposed above, I need to ensure that the guid remains yW:,sx48qM (from this line of code).

In the case of this deck, I am not sure ki would reduce the size of diffs, since cards are generated programmatically from a json file already. That begin said, it would standardize the collaboration technique and make the file format more accessible to end-users wishing to contribute: this is super attractive.

langfield commented 2 years ago

Hi! Lovely idea! And thank you for reaching here. I see the value in having a standard way to share collaborative decks.

A questions that comes to my mind is, how do you take care of note guids? I'm asking because I struggled in the past (see this issue: #1), and glancing at your doc I see no way of managing them. Taking the Silver example you proposed above, I need to ensure that the guid remains yW:,sx48qM (from this line of code).

Thanks so much! Excellent question about the guids! I've thought about this a lot actually. You can see my own development notes on this problem here. Reproduced below.

Pulling remote changes from GitHub decks while preserving review data
---------------------------------------------------------------------
Consider the following scenario:
- User A clones their collection into a ki repository. All the nids in the note
  files are the nids generated by their Anki installation.
- User A converts their `C* Algebras` deck into a submodule in their
  repository, and pushes this repo to GitHub.
- User B clones their collection, and then clones User A's `C-star-algebras`
  GitHub repo into their ki repository as a submodule. New nids are generated
  for all the notes in this deck, since User B's Anki installation does not
  recognize the nids generated by User A's Anki installation, which are still
  listed in all the note files on GitHub. These new nids are committed to the
  submodule by ki.
- User B studies these notes, accumulating review data.
- User A pushes commits to their `C-star-algebras` GitHub repo, which contain
  important corrections to errata in the deck.
- User B wants these corrections, so they run `git pull` within their submodule
  inside their ki repository. Their branch has diverged from the remote, since
  ki automatically committed the nid reassignments. The merge likely succeeds
  because the line where the nid is defined within each note will not have
  changed within User A's local copy of the repository, since they're still
  using the same nid for that note.
- User B corrects some errata in their local copy of the `C* Algebras` deck
  from within Anki. They want to push their corrections back to User A's GitHub
  repo. They make a pull request with their corrections, but the diff is
  *HUGE*, because the nid for every single note has been changed. This is not
  ideal, because they will have to manually go in and remove the nid changes,
  perhaps with some git wizardry. In particular, they would have to construct a
  branch without the nid reassignment commit.

After thinking about the problem at length, this does not seem to be a
critically high priority issue. Although annoying, it will not cause the
collaboration workflow to be prohibitively cumbersome.

It should still be fixed long-term, however. One possible solution is replacing
the `nid` field within the note grammar with a `uuid` field. There would exist
a `.ki/manifest.yaml` file, which would map uuids to nids. This manifest file
would be unique to each user/Anki installation, whereas the uuid for that
particular note would be somethat that is unique across all users studying that
note (e.g. if it resided in a collaborative deck hosted on GitHub). During a
clone operation, a uuid would be generated for each note, perhaps seeded with a
hash of the note's content. Then the manifest file would be generated, which,
as we mentioned above, would simply map uuids to nids. The uuids would live in
the note files, and then there would be no need to worry about different
people's nids when merging pull requests on GitHub. Everyone would have the
same uuid for a given note. We now have a very simple condition for when an nid
must be regenerated: this must be done whenever we parse a uuid from a note
that does not exist in the manifest yet.

In short, gits ability to detect renames (and in particular its low false-positive rate) seemed to be doing a good enough job preventing this problem in the maintainer -> subscriber direction.

However, I see that the issue raised in #1 could be an important counterexample. I'm not sure, so I'll have to write a test for it. In any case, I've had the fix for this problem mapped out for ages. I actually didn't know about Anki's guids when I wrote the above bug report, so a manifest.yaml file is not even necessary. Simply replacing the nid field with a guid field, or making this an additional field in the note grammar will be sufficient.

If you feel satisfied with the idea of trying this workflow out on a dev branch once the guid fix is merged, I'd love to help you get it set up!

langfield commented 2 years ago

In the case of this deck, I am not sure ki would reduce the size of diffs, since cards are generated programmatically from a json file already. That begin said, it would standardize the collaboration technique and make the file format more accessible to end-users wishing to contribute: this is super attractive.

As for diffs, the usefulness of this would come in two forms: (1) File-level git change types give you more granular information, since each element is its own file, and (2) more granular intra-file diffs if you choose to break your field sources into multiple lines, or needed to.

For example if you wanted to break the phrase field up into multiple lines in the source, purely for readability (since without <br> tags it will render the same anyway):

https://github.com/remiberthoz/anki-periodic-table-memory-pegs/blob/995cb1ead3b08465f195ddedfbefd6b33d74e006/src/data.json#L323-L330

I freely admit with your use case you may not get a ton of extra utility out of this feature, haha.

remiberthoz commented 2 years ago

In short, gits ability to detect renames (and in particular its low false-positive rate) seemed to be doing a good enough job preventing this problem in the maintainer -> subscriber direction.

This sounds reasonable, but I would avoid asking subscribers or contributors perform any maintenance work. This is super easy work for people knowing git, but will raise issues for less technically-skilled Anki users (and issues will be transferred to deck maintainers, either via assitance or via complains...).


However, I see that the issue raised in #1 could be an important counterexample. I'm not sure, so I'll have to write a test for it.

If I remember correctly, I made a mistake in #1, I'm not sure it will happen often. I am however concerned about what will happen when I migrate to ki and all GUIDs are modified. Unless...

Simply replacing the nid field with a guid field, or making this an additional field in the note grammar will be sufficient.

Yes! genanki (which I currently use), generates a Note ID from the guid and timestamp, upon deck generation.


If you feel satisfied with the idea of trying this workflow out on a dev branch once the guid fix is merged, I'd love to help you get it set up! [...] (1) File-level git change types give you more granular information, since each element is its own file, and (2) more granular intra-file diffs if you choose to break your field sources into multiple lines, or needed to.

Yes I will this out! And you're right about granularity. We keep in touch. I'll star :star: & watch :eyes: your repo to be notified of changes.