New generated cards need to be shown to a user upon reviewing a card. Since we need to make a request to OpenAI, this can induce significant latency when reviewing cards.

Requirement: generated cards should take less than a second to show.

Filling the cache - triggers

We can use multiple strategies to fill the cache. I'll list the pros and cons:

Strategy	Async needed	Pros	Cons
Upon startup	Yes	Can make generated cards while user is busy with other things.	Cards may be generated that aren't reviewed
Upon card review	No	Easy to implement	Have to wait the first time a card is reviewed.
Upon card creation	Yes	Can make generated cards as soon as they are known in the system	Card may not be reviewed, costing money
Upon deck view	Yes	Can make generated cards specific to deck being viewed	Cards may not be ready by the time the user starts reviewing. User may not actually review the deck, costing money.
Upon deck review	Yes	Can make generated cards specific to deck being reviewed	Will have to wait for the first card of the deck to be generated

Given these pros and cons, I will consider implementing in the following order:

[x] Upon card review: easy to implement, will already see some benefits.
[ ] Upon startup: can keep all the generated cards up to date so we don't have a first-time latency.
- Need to think about how many concurrent requests we can send to ChatGPT. Check the rate limiter.
[ ] Upon card creation: can probably identify an event that we can react to with a hook
[ ] Upon deck review: can probably identify an event that we can react to with a hook

Preventing card review blocking

Especially when filling the cache upon startup or any other event outside card review, we need any OpenAI calls to be made in the background. Here are two possible solutions to this:

[ ] Perform OpenAI calls async, so we don't need to wait for the call to be finished to continue the application
[ ] Start a new process that performs OpenAI calls, and monitors any cards for which we need to fill the cache.

Technology implementation of the cache

Generated cards need to be persisted to disk, because we may not be in the same Anki session the next time we review the card. Anki may have been closed and then reopened.

Persisting to disk means we need to be able to read the cards quickly. This means we can store them in a database. The database is of a size of the same order of magnitude as the cards which the client already has on their local system. To avoid slow network traffic, it makes sense to also store this table locally.

[x] I will start with JSON as this does not require any additional dependencies for the Anki plugin. If this turns out to be unperformant, we can consider other options such as SQLite, NoSQL alternatives, Redis or Memcached.

Just implemented caching upon card review using a JSON file:

A cache fill is triggered upon card review, if the cache is lower than min_cards=5. It then asks for 10 more cards from OpenAI.
Every time we see a card, we get one of the cards from the cache and remove it from it, gradually reducing the cache.

Limitations we need to iron out:

First time we see a card, generating 10 cards to fill the cache is still quite slow.
When the number of cards hits the min_cards, the process is blocked for filling the cache even though we actually have a card to show.

Here are some possible solutions:

[ ] Cache the first $n$ (e.g. $n=3$) cards of each deck upon startup.
[ ] Request a smaller amount of cards (say $n=2$) if the cache is completely empty, to reduce the OpenAI latency.
[ ] When a deck is being reviewed, start the process of caching the rest of the deck. We have a head start, since we have already cached $n$ cards from the deck.
[ ] Perform the OpenAI call async, so that when the cache hits min_cards, we can fill the cache in the background while we continue to serve cards from the cache
[ ] Nice to have: make any updates to the JSON file async so that the process is not blocked (so far JSON has not been the limiting factor though).

mathijsvdv / phrasify

Performance enhancement: cache card generation #10

Filling the cache - triggers

Preventing card review blocking

Technology implementation of the cache