wjrm500 / LexiDeck

0 stars 0 forks source link

LexiDeck

What is it?

LexiDeck is an application designed to support language learning in Anki. It does this by fetching translations for a user-specified list of words and generating an Anki deck filled with flashcards corresponding to those translations. The application is designed to be flexible and allows words to be entered via a variety of sources, including command line arguments, CSV files and existing Anki decks. The means by which translation data are retrieved is also configurable, with retrieval options including the web scraping of online dictionaries (such as SpanishDict) and API calls to OpenAI (ChatGPT).

Basic usage and concepts

You'll need Python and Git to get started. Once you have these installed on your computer, run git clone https://github.com/wjrm500/LexiDeck.git in your terminal to clone the LexiDeck repository onto your local machine. Then from within the repository, run either pip install . for a standard installation of the program, or pip install -e .[dev] for a dev installation. It is recommended but not essential that you create a virtual environment (with python -m venv venv) and activate it (with .\venv\Scripts\activate on Windows, or source venv/bin/activate on MacOS or Linux) before running the pip install command, as this keeps dependencies isolated. You can confirm the installation worked by running lexideck --version in your terminal - the output should be lexideck 1.0.0.

Now, let's say we want to create a deck of Spanish -> English flashcards. In the first instance, we'll only add a single word hola. We can do this by running the program like so:

lexideck --words hola --language-from spanish --language-to english --retriever-type spanishdict

The output will be an Anki package file output.apkg. Now we need to load that into our existing Anki collection. We can do that by opening up Anki, clicking "Import File", selecting and opening output.apkg, and clicking "Import". A new deck "Language learning flashcards" will have been created, containing a single note:

Front:

Front of an Anki note showing the word 'hola' in LexiDeck

Front and back:

Front and back of an Anki note showing the word 'hola' and its translation in LexiDeck

As you can see, the note includes a few different components:

Front:

Back:

To reiterate something I mentioned above - separate notes are created for each part of speech, for each word being translated. For example, the Spanish word amanecer can be translated as a masculine noun ("dawn"), an impersonal verb ("to dawn") and an intransitive verb ("to wake up" or "to stay up all night"):

Coloured console output showing different translations and definitions for 'amanecer'

This means that if you ran the program like this:

lexideck --words amanecer --language-from spanish --language-to english --retriever-type spanishdict

You'd end up with a deck containing not one note but three:

First Anki note for 'amanecer' as a masculine noun

Second Anki note for 'amanecer' as an impersonal verb

Third Anki note for 'amanecer' as an intransitive verb

You'll notice that on the final note, we have multiple source language example sentences, definitions and target language example sentences. This happens when there are multiple, distinct meanings for a single word and part of speech. Another example would be the Spanish word banco, which as a masculine noun can mean both bench and bank in English. As a counter-example, in the case of the Spanish noun papa, which can mean both daddy and potato, we end up with two separate notes because daddy is the masculine noun translation (el papa) while potato is the feminine noun translation (la papa).

Sources

In version 1.0.0 of LexiDeck, there are three different ways you can input words:

Sources - command line

You can enter words via the command line just like we did in the examples above - with the argument --words:

lexideck --words mariposa --language-from spanish --language-to english --retriever-type spanishdict

To enter multiple words, simply separate with a space:

lexideck --words queso puerta tener --language-from spanish --language-to english --retriever-type spanishdict

And if you have a "word" that is actually multiple words, such as a veces, you can delimit it with speech marks, like so:

lexideck --words "a veces" --language-from spanish --language-to english --retriever-type spanishdict

Sources - CSV

You can enter words from a CSV by specifying the path to the CSV file in the argument --csv:

lexideck --csv words.csv --language-from spanish --language-to english --retriever-type spanishdict

By default, the source expects (A) that there is no header row containing column names and (B) that the words appear in the first column. If this is not the case for your CSV file, you can configure the source to skip the first row with the --skip-first-row argument or take the words from a different column using the --col-num argument (column numbers use zero-indexing).

Sources - existing Anki deck

You can enter words from an existing Anki deck by specifying the arguments --input-anki-package-path, --input-anki-deck-name and --input-anki-field-name, e.g.,:

lexideck --input-anki-package-path input.apkg --input-anki-deck-name "Language learning flashcards" --input-anki-field-name Word --language-from spanish --language-to english --retriever-type spanishdict

Loading words from an existing Anki deck allows you to take advantage of the work done by other Anki language learners in compiling useful words into decks. Here are some instructions that demonstrate how you would utilise the existing deck "A Frequency Dictionary of Spanish" to create your own Spanish -> English deck:

  1. Open the Anki app and click "Get shared". You'll be taken to the web page https://ankiweb.net/shared/decks
  2. Click "Spanish" under the "Languages" category
  3. Click "A Frequency Dictionary of Spanish"
  4. Download the deck
  5. Import the deck into your Anki collection. You can open the import tool by double-clicking on the downloaded .apkg file in your downloads folder, or by opening the Anki app and clicking "Import file". Then click "Import"
  6. Open one of the notes in editable mode (you can click "Show" after "Import") and make a note of the name of the field containing the words whose translations you want to practice recalling, in this case Word
  7. Run LexiDeck! lexideck --input-anki-package-path /path/to/downloads/_A_Frequency_Dictionary_of_Spanish.apkg --input-anki-deck-name "A Frequency Dictionary of Spanish" --input-anki-field-name Word --language-from spanish --language-to english --retriever-type spanishdict

Retrievers

Retrievers are the mechanism through which translations and example sentences are obtained from the internet. You'll have noticed that in all of the above examples, the --retriever-type argument is given as spanishdict. This was simply done to keep the focus on the other arguments that were relevant to the demonstration; there are multiple types of retriever available:

The --language-from and --language-to arguments are used to help the retriever retrieve the right data. If you enter a language pairing that the specified --retriever-type does not support, LexiDeck will inform you and exit immediately.

The --concise-mode argument basically reduces the number of notes produced, as well as the amount of text in individual notes, by pruning translations and definitions. It is particularly effective in combination with the spanishdict retriever type: SpanishDict makes it clear which are the "principal" translations for a given word, and this information can be used to prune translations or definitions that do not correspond to these principal translations.

Some example commands:

lexideck --words hello --language-from english --language-to german --retriever-type wordreference --verbose

Console output showing note creation for English to German translation of 'spider'

Anki note for 'spider'

lexideck --words adelante --language-from spanish --language-to italian --retriever-type openai --verbose

Console output showing note creation for Spanish to Italian translation of 'adelante'

First Anki note for 'adelante' as an interjection

Second Anki note for 'adelante' as a noun

Third Anki note for 'adelante' as an adverb

Additional notes

If you are interested in the inner workings of the application, feel free to check out the source code under the app directory. Most classes and methods are documented quite descriptively.

Arguments not explicitly mentioned so far include --concurrency-limit, --output-anki-package-path, --output-anki-deck-name, --note-limit and --verbose. For more information on these and a comprehensive list of all available arguments, please run:

lexideck --help

For your convenience, several resources not directly needed for the running or development of the application have been included in this repository:

Get involved

This project is completely open source and contributions are welcome! Just open a pull request and write a concise description of the change you've made and why, and I'll take a look. Off the top of my head, I would be interested to see:

I would also be very interested to hear about any larger, architectural changes you have in mind, although in these cases it would be best to contact me to discuss before beginning development.