hugolpz commented 3 years ago

Hello @Infinyte7,

Happy to see you used my audio-cmn repository :)

Shtooka Recorder has become LinguaLibre in 2017. We are back on track to record 100s thousands new audios for 100+ languages. See:
- Downloadable datasets : https://lingualibre.org/datasets/ (list will be updated in March 2020)
- List of languages and files in them https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation
- Example of filenames : LL-Q652_(ita)-{username}-{recorded_word}.wav
Wikidata and MediaWiki APIs allows to fetch definitions in multiples languages. See screenshot below: the right half is fetched via API call over wiktionaries.

LinguaLibre_SignIt-01

Also, I'am just starting to think about LinguaLibre Anki Decks Maker repository, which would:

Given one language (ex: ITA Italian)
Given one associated .zip file: Q385-ita-Italian.zip.
Extract the list of audio filenames then the list of words recorded
Fetch definition via mediawiki API
Code an Anki Decks Maker which use available words, audios/media, definitions data to create elegant Anki decks.

@LinguaLibre volunteers have knowledge to create the steps 1 to 4. As for Step 5: Code an Anki Decks Maker, we have no knowledge while I assume you gained relevant knowledge on this field when you created Anki-Chinese-Vocabulary-Generator. So I wonder if you could give us some guidance.

Is there a project or Anki documentation page which would best help LinguaLibre's volunteers to code a Anki Decks Maker ?
What resources helped you the most in your project, and which we could use ?
Any specific pitfall we should be aware of ? Any tiny comment may help this project a lot.

PS: I'am also considering creating an empty github repository to host this conversation and not occupy your repository.

krmanik commented 3 years ago

Anki Decks

Anki Decks are files with .apkg or .colpkg extension. It contains HTML data used for card presentation as well as text images, and sounds. https://docs.ankiweb.net/#/exporting?id=deck-apkg

Modules required for making Anki Deck Maker

1. Using JavaScript node.js module https://www.npmjs.com/package/anki-apkg-export I have tried this for running inside browser but didn't work as it require sql operation for generating decks. Also I have also tried https://wzrd.in/ to make it run inside browser but didn't work.

2. Using Python

2.1 For desktop https://github.com/kerrickstaley/genanki

2.2 For web apps I have made all the python-non-any-wheel required by genanki python module for running inside browser. https://github.com/infinyte7/Anki-Export-Deck-tkinter/tree/master/docs

Live at https://infinyte7.github.io/Anki-Export-Deck-tkinter/

I have used only genanki python module in web as well as desktop apps. So I will explain for Python only. The genanki can take csv and tsv files and generate Anki decks. It has option to add custom css for styling and html.

I will take example of Chinese language.

1. Lets say we have Chinese Language word lists data.

Example words data for generating decks. It will be better to save these as csv or tsv files. Because deck generation from csv and tsv is easier.

word_data = { 
    '比如': ['比如', 'bǐ rú', 'to take for example; for example', 'cmn-audio-比如.mp3'], 
    '中文': ['中文', 'zhōng wén', 'Chinese language', 'cmn-audio-中文.mp3'] 
}

2. So Anki Decks should also have five fields. We are creating Anki decks from scratch using genanki.

Anki has three section in decks templates Front Side -- Generally for showing questions, (html) Back Side -- Generally for showing answers, (html) Card CSS -- Anki decks style (css)

So, lets say we want to show pinyin and audio as question for Anki decks then I will add following to front side. Any fields can be added.

front = `<div>{{Pinyin}}</div>
<div>{{Audio}}`

And all other fields as answer to back side

back = `<div>{{Simplified}}</div>
<div>{{Traditional}}</div>
<div>{{Pinyin}}</div>
<div>{{Audio}}</div>
<div>{{Definition}}</div>
<div>{{Images}}</div>`

Also to style our cards. It is default. But can be modified as per requirements. The .card is used in Anki to display card look.

css = `.card {
    font-family: arial;
    font-size: 20px;
    text-align: center;
    color: black;
    background-color: white;}`

3. Full code. A simple script to generate decks. I tried my best to simplify the code.

import random
import genanki

from glob import glob
from os.path import join

anki_deck_title = "Anki Deck"

anki_model_name = "Sample Model"

model_id = random.randrange(1 << 30, 1 << 31)

out_file = "output.apkg"

front_html = """<div>{{Pinyin}}</div>
<div>{{Audio}}</div>"""

back_html = """<div>{{Simplified}}</div>
<div>{{Traditional}}</div>
<div>{{Pinyin}}</div>
<div>{{Audio}}</div>
<div>{{Definitions}}</div>
"""

card_css = """.card {
    font-family: arial;
    font-size: 20px;
    text-align: center;
    color: black;
    background-color: white;
    }
    """

my_model = genanki.Model(
  model_id,
  anki_model_name,
  fields=[
    {'name': 'Simplified'},
    {'name': 'Traditional'},
    {'name': 'Pinyin'},
    {'name': 'Audio'},
    {'name': 'Definitions'}
  ],
  templates=[
    {
      'name': 'Card 1',
      'qfmt': front_html,
      'afmt': back_html,
    },
  ],
  css=card_css,
  )

word_data = {'比如': ['比如', 'bǐ rú', 'to take for example; for example; for instance; such as to suppose; supposing; if', 'cmn-比如.mp3'], '中文': ['中文', 'Zhōng wén', 'Chinese language', 'cmn-中文.mp3'] }

my_notes = []
for word in word_data:
    flds = []
    flds.append(word)

    # replace last field as [sound:cmn-word.mp3], assume last field contain audio
    last_field = word_data[word][-1]
    word_data[word][-1] = "[sound:" + last_field + "]"

    for w in word_data[word]:
        flds.append(w)

    anki_note = genanki.Note(
        model=my_model,
        fields=flds,
        )

    my_notes.append(anki_note)

anki_deck = genanki.Deck(model_id, anki_deck_title)
anki_package = genanki.Package(anki_deck)

# add media
files = []
for ext in ('*.mp3', '*.ogg', '*.wav'):
    files.extend(glob(join("audio/", ext)))

anki_package.media_files = files

for note in my_notes:
    anki_deck.add_note(note)

anki_package.write_to_file(out_file)

Note 1: To add audio in Anki we use

[sound:audio-file.mp3]

Note 2: To add any files, we can pass file location to anki_package.media_files. I have passed all the audio from audio directory.

# add media
files = []
for ext in ('*.mp3', '*.ogg', '*.wav'):
    files.extend(glob(join("audio/", ext)))

anki_package.media_files = files

Demo

demo1

More Examples

I have used above in web apps for generating Anki decks using Pyodide to run Python in browser. (Serverless implementation) https://github.com/infinyte7/Anki-Export-Deck-tkinter/blob/master/docs/js/index.js https://github.com/infinyte7/image-occlusion-in-browser/blob/master/docs/v2/js/deck-export.js

More

For Desktop Anki addons will be better. https://addon-docs.ankiweb.net/

For Android AnkiDroid-API will be useful to create separate Android app and add data to AnkiDroid https://github.com/ankidroid/Anki-Android/wiki/AnkiDroid-API

For AnkiMobile there are url schemes used to send data to AnkiMobile (No experience) https://docs.ankimobile.net/#/more?id=url-schemes

Anki Forums

https://forums.ankiweb.net/