krmanik / Anki-Chinese-Vocabulary-Generator

Enter only simplified characters and create word meaning with Traditional, Pinyin, Meaning, Audio and example sentences
Other
30 stars 6 forks source link

Anki Deck generator for multiple languages : exploring avenues #6

Open hugolpz opened 3 years ago

hugolpz commented 3 years ago

Hello @Infinyte7,

Happy to see you used my audio-cmn repository :)

  1. Shtooka Recorder has become LinguaLibre in 2017. We are back on track to record 100s thousands new audios for 100+ languages. See:
  2. Wikidata and MediaWiki APIs allows to fetch definitions in multiples languages. See screenshot below: the right half is fetched via API call over wiktionaries.

LinguaLibre_SignIt-01

Also, I'am just starting to think about LinguaLibre Anki Decks Maker repository, which would:

  1. Given one language (ex: ITA Italian)
  2. Given one associated .zip file: Q385-ita-Italian.zip.
  3. Extract the list of audio filenames then the list of words recorded
  4. Fetch definition via mediawiki API
  5. Code an Anki Decks Maker which use available words, audios/media, definitions data to create elegant Anki decks.

@LinguaLibre volunteers have knowledge to create the steps 1 to 4. As for Step 5: Code an Anki Decks Maker, we have no knowledge while I assume you gained relevant knowledge on this field when you created Anki-Chinese-Vocabulary-Generator. So I wonder if you could give us some guidance.

  1. Is there a project or Anki documentation page which would best help LinguaLibre's volunteers to code a Anki Decks Maker ?
  2. What resources helped you the most in your project, and which we could use ?
  3. Any specific pitfall we should be aware of ? Any tiny comment may help this project a lot.

PS: I'am also considering creating an empty github repository to host this conversation and not occupy your repository.

krmanik commented 3 years ago

Anki Decks

Anki Decks are files with .apkg or .colpkg extension. It contains HTML data used for card presentation as well as text images, and sounds. https://docs.ankiweb.net/#/exporting?id=deck-apkg

Modules required for making Anki Deck Maker

1. Using JavaScript node.js module https://www.npmjs.com/package/anki-apkg-export I have tried this for running inside browser but didn't work as it require sql operation for generating decks. Also I have also tried https://wzrd.in/ to make it run inside browser but didn't work.

2. Using Python

2.1 For desktop https://github.com/kerrickstaley/genanki

2.2 For web apps I have made all the python-non-any-wheel required by genanki python module for running inside browser. https://github.com/infinyte7/Anki-Export-Deck-tkinter/tree/master/docs

Live at https://infinyte7.github.io/Anki-Export-Deck-tkinter/

I have used only genanki python module in web as well as desktop apps. So I will explain for Python only. The genanki can take csv and tsv files and generate Anki decks. It has option to add custom css for styling and html.

I will take example of Chinese language.

1. Lets say we have Chinese Language word lists data.

| Simplified | Traditional | Pinyin | Audio | Definitions |

Example words data for generating decks. It will be better to save these as csv or tsv files. Because deck generation from csv and tsv is easier.

word_data = { 
    '比如': ['比如', 'bǐ rú', 'to take for example; for example', 'cmn-audio-比如.mp3'], 
    '中文': ['中文', 'zhōng wén', 'Chinese language', 'cmn-audio-中文.mp3'] 
}

2. So Anki Decks should also have five fields. We are creating Anki decks from scratch using genanki.

Anki has three section in decks templates Front Side -- Generally for showing questions, (html) Back Side -- Generally for showing answers, (html) Card CSS -- Anki decks style (css)

So, lets say we want to show pinyin and audio as question for Anki decks then I will add following to front side. Any fields can be added.

front = `<div>{{Pinyin}}</div>
<div>{{Audio}}`

And all other fields as answer to back side

back = `<div>{{Simplified}}</div>
<div>{{Traditional}}</div>
<div>{{Pinyin}}</div>
<div>{{Audio}}</div>
<div>{{Definition}}</div>
<div>{{Images}}</div>`

Also to style our cards. It is default. But can be modified as per requirements. The .card is used in Anki to display card look.

css = `.card {
    font-family: arial;
    font-size: 20px;
    text-align: center;
    color: black;
    background-color: white;}`

3. Full code. A simple script to generate decks. I tried my best to simplify the code.

import random
import genanki

from glob import glob
from os.path import join

anki_deck_title = "Anki Deck"

anki_model_name = "Sample Model"

model_id = random.randrange(1 << 30, 1 << 31)

out_file = "output.apkg"

front_html = """<div>{{Pinyin}}</div>
<div>{{Audio}}</div>"""

back_html = """<div>{{Simplified}}</div>
<div>{{Traditional}}</div>
<div>{{Pinyin}}</div>
<div>{{Audio}}</div>
<div>{{Definitions}}</div>
"""

card_css = """.card {
    font-family: arial;
    font-size: 20px;
    text-align: center;
    color: black;
    background-color: white;
    }
    """

my_model = genanki.Model(
  model_id,
  anki_model_name,
  fields=[
    {'name': 'Simplified'},
    {'name': 'Traditional'},
    {'name': 'Pinyin'},
    {'name': 'Audio'},
    {'name': 'Definitions'}
  ],
  templates=[
    {
      'name': 'Card 1',
      'qfmt': front_html,
      'afmt': back_html,
    },
  ],
  css=card_css,
  )

word_data = {'比如': ['比如', 'bǐ rú', 'to take for example; for example; for instance; such as to suppose; supposing; if', 'cmn-比如.mp3'], '中文': ['中文', 'Zhōng wén', 'Chinese language', 'cmn-中文.mp3'] }

my_notes = []
for word in word_data:
    flds = []
    flds.append(word)

    # replace last field as [sound:cmn-word.mp3], assume last field contain audio
    last_field = word_data[word][-1]
    word_data[word][-1] = "[sound:" + last_field + "]"

    for w in word_data[word]:
        flds.append(w)

    anki_note = genanki.Note(
        model=my_model,
        fields=flds,
        )

    my_notes.append(anki_note)

anki_deck = genanki.Deck(model_id, anki_deck_title)
anki_package = genanki.Package(anki_deck)

# add media
files = []
for ext in ('*.mp3', '*.ogg', '*.wav'):
    files.extend(glob(join("audio/", ext)))

anki_package.media_files = files

for note in my_notes:
    anki_deck.add_note(note)

anki_package.write_to_file(out_file)

Note 1: To add audio in Anki we use

[sound:audio-file.mp3]

Note 2: To add any files, we can pass file location to anki_package.media_files. I have passed all the audio from audio directory.

# add media
files = []
for ext in ('*.mp3', '*.ogg', '*.wav'):
    files.extend(glob(join("audio/", ext)))

anki_package.media_files = files

Demo

demo1

More Examples

I have used above in web apps for generating Anki decks using Pyodide to run Python in browser. (Serverless implementation) https://github.com/infinyte7/Anki-Export-Deck-tkinter/blob/master/docs/js/index.js https://github.com/infinyte7/image-occlusion-in-browser/blob/master/docs/v2/js/deck-export.js

More

For Desktop Anki addons will be better. https://addon-docs.ankiweb.net/

For Android AnkiDroid-API will be useful to create separate Android app and add data to AnkiDroid https://github.com/ankidroid/Anki-Android/wiki/AnkiDroid-API

For AnkiMobile there are url schemes used to send data to AnkiMobile (No experience) https://docs.ankimobile.net/#/more?id=url-schemes

Anki Forums

https://forums.ankiweb.net/

hugolpz commented 3 years ago

Hi Infinyte, Lot to unpack in your post. I think it puts us (LinguaLibre folks) well on track for this project. I will need to come back at it to examine it line by line on the code side. Sharing the [sound:audio-file.mp3] trick is basic yet very helpful :rocket:

Also a secret which may interest you... we have started attacking Cantonese language and now have 5000+ audios ; ) I wish to get the whole HSK 2012 done by June 2021. :1st_place_medal: :smiley_cat: