LexiDeck is an application designed to support language learning in Anki. It does this by fetching translations for a user-specified list of words and generating an Anki deck filled with flashcards corresponding to those translations. The application is designed to be flexible and allows words to be entered via a variety of sources, including command line arguments, CSV files and existing Anki decks. The means by which translation data are retrieved is also configurable, with retrieval options including the web scraping of online dictionaries (such as SpanishDict) and API calls to OpenAI (ChatGPT).
You'll need Python and Git to get started. Once you have these installed on your computer, run git clone https://github.com/wjrm500/LexiDeck.git
in your terminal to clone the LexiDeck repository onto your local machine. Then from within the repository, run either pip install .
for a standard installation of the program, or pip install -e .[dev]
for a dev installation. It is recommended but not essential that you create a virtual environment (with python -m venv venv
) and activate it (with .\venv\Scripts\activate
on Windows, or source venv/bin/activate
on MacOS or Linux) before running the pip install
command, as this keeps dependencies isolated. You can confirm the installation worked by running lexideck --version
in your terminal - the output should be lexideck 1.0.0
.
Now, let's say we want to create a deck of Spanish -> English flashcards. In the first instance, we'll only add a single word hola. We can do this by running the program like so:
lexideck --words hola --language-from spanish --language-to english --retriever-type spanishdict
The output will be an Anki package file output.apkg
. Now we need to load that into our existing Anki collection. We can do that by opening up Anki, clicking "Import File", selecting and opening output.apkg
, and clicking "Import". A new deck "Language learning flashcards" will have been created, containing a single note:
Front:
Front and back:
As you can see, the note includes a few different components:
Front:
Back:
To reiterate something I mentioned above - separate notes are created for each part of speech, for each word being translated. For example, the Spanish word amanecer can be translated as a masculine noun ("dawn"), an impersonal verb ("to dawn") and an intransitive verb ("to wake up" or "to stay up all night"):
This means that if you ran the program like this:
lexideck --words amanecer --language-from spanish --language-to english --retriever-type spanishdict
You'd end up with a deck containing not one note but three:
You'll notice that on the final note, we have multiple source language example sentences, definitions and target language example sentences. This happens when there are multiple, distinct meanings for a single word and part of speech. Another example would be the Spanish word banco, which as a masculine noun can mean both bench and bank in English. As a counter-example, in the case of the Spanish noun papa, which can mean both daddy and potato, we end up with two separate notes because daddy is the masculine noun translation (el papa) while potato is the feminine noun translation (la papa).
In version 1.0.0 of LexiDeck, there are three different ways you can input words:
You can enter words via the command line just like we did in the examples above - with the argument --words
:
lexideck --words mariposa --language-from spanish --language-to english --retriever-type spanishdict
To enter multiple words, simply separate with a space:
lexideck --words queso puerta tener --language-from spanish --language-to english --retriever-type spanishdict
And if you have a "word" that is actually multiple words, such as a veces, you can delimit it with speech marks, like so:
lexideck --words "a veces" --language-from spanish --language-to english --retriever-type spanishdict
You can enter words from a CSV by specifying the path to the CSV file in the argument --csv
:
lexideck --csv words.csv --language-from spanish --language-to english --retriever-type spanishdict
By default, the source expects (A) that there is no header row containing column names and (B) that the words appear in the first column. If this is not the case for your CSV file, you can configure the source to skip the first row with the --skip-first-row
argument or take the words from a different column using the --col-num
argument (column numbers use zero-indexing).
You can enter words from an existing Anki deck by specifying the arguments --input-anki-package-path
, --input-anki-deck-name
and --input-anki-field-name
, e.g.,:
lexideck --input-anki-package-path input.apkg --input-anki-deck-name "Language learning flashcards" --input-anki-field-name Word --language-from spanish --language-to english --retriever-type spanishdict
Loading words from an existing Anki deck allows you to take advantage of the work done by other Anki language learners in compiling useful words into decks. Here are some instructions that demonstrate how you would utilise the existing deck "A Frequency Dictionary of Spanish" to create your own Spanish -> English deck:
https://ankiweb.net/shared/decks
.apkg
file in your downloads folder, or by opening the Anki app and clicking "Import file". Then click "Import"Word
lexideck --input-anki-package-path /path/to/downloads/_A_Frequency_Dictionary_of_Spanish.apkg --input-anki-deck-name "A Frequency Dictionary of Spanish" --input-anki-field-name Word --language-from spanish --language-to english --retriever-type spanishdict
Retrievers are the mechanism through which translations and example sentences are obtained from the internet. You'll have noticed that in all of the above examples, the --retriever-type
argument is given as spanishdict
. This was simply done to keep the focus on the other arguments that were relevant to the demonstration; there are multiple types of retriever available:
spanishdict
- this retriever scrapes translations and examples from SpanishDict. It is accurate and comprehensive, and should be your go-to retriever for generating English -> Spanish or Spanish -> English decks. The website is rate-limited, although LexiDeck handles rate limiting smoothly.
Available language pairs:
wordreference
- this retriever scrapes translations and examples from WordReference.com. It is less reliable than SpanishDict, but offers a wide language selection. This website will sometimes throw a captcha - in these cases, LexiDeck will pause processing and request manual intervention before proceeding.
Available language pairs:
collins
- this retriever scrapes translations and examples from Collins Online Dictionary - or rather, it would if Collins hadn't cleverly protected themselves from scraping using Cloudflare anti-bot protection. To clarify, this retriever currently works only in theory, not in practice.
Available language pairs (in theory):
openai
- this retriever makes calls to OpenAI's models through the company's API, with models including GPT-3.5 or GPT-4 currently available. As OpenAI APIs are paid, you will need to create an API key and add it to an .env
file (see .env.example
for format). It can be a (very) slow and costly alternative to web scraping, but depending on the model the results can be excellent, and the language choice is theoretically unlimited. At this stage this retriever is more experimental than genuinely useful, but future changes to the API might change that. Bear in mind that you will be asked to specify which OpenAI model you wish to use when you use this retriever type.
Available language pairs:
The --language-from
and --language-to
arguments are used to help the retriever retrieve the right data. If you enter a language pairing that the specified --retriever-type
does not support, LexiDeck will inform you and exit immediately.
The --concise-mode
argument basically reduces the number of notes produced, as well as the amount of text in individual notes, by pruning translations and definitions. It is particularly effective in combination with the spanishdict
retriever type: SpanishDict makes it clear which are the "principal" translations for a given word, and this information can be used to prune translations or definitions that do not correspond to these principal translations.
Some example commands:
lexideck --words hello --language-from english --language-to german --retriever-type wordreference --verbose
lexideck --words adelante --language-from spanish --language-to italian --retriever-type openai --verbose
If you are interested in the inner workings of the application, feel free to check out the source code under the app
directory. Most classes and methods are documented quite descriptively.
Arguments not explicitly mentioned so far include --concurrency-limit
, --output-anki-package-path
, --output-anki-deck-name
, --note-limit
and --verbose
. For more information on these and a comprehensive list of all available arguments, please run:
lexideck --help
For your convenience, several resources not directly needed for the running or development of the application have been included in this repository:
resources/oxford_5000.csv
.resources/A_Frequency_Dictionary_of_Spanish.apkg
.resources/english-spanish-spanishdict-2023-12-21.apkg
resources/spanish-english-spanishdict-2023-12-21.apkg
This project is completely open source and contributions are welcome! Just open a pull request and write a concise description of the change you've made and why, and I'll take a look. Off the top of my head, I would be interested to see:
I would also be very interested to hear about any larger, architectural changes you have in mind, although in these cases it would be best to contact me to discuss before beginning development.