spraakbanken / karp-backend

Karp backend
MIT License
3 stars 2 forks source link

Make Elasticsearch index robust to crashing #242

Open nick8325 opened 9 months ago

nick8325 commented 9 months ago

Currently the process for adding entries goes as follows:

  1. Add them to MariaDB
  2. Commit
  3. Add them to Elasticsearch

The problem is that if the backend crashes while adding the entries to Elasticsearch, we will have entries in MariaDB but not in Elasticsearch. (Note this can't be solved by swapping step 2 and 3, because if we crash while adding entries to Elasticsearch, we will have the opposite problem, entries in Elasticsearch but not MariaDB.)

One option for solving: create a new table in MariaDB of "recently added/deleted entries". We sync MariaDB to Elasticsearch by adding/removing the entries in this table to the index, then (when everything is done) emptying the table.

Another option: the index remembers what "revision" the repository was at when it was last synced (e.g., the max history_id in the entries table). Then to sync, we ask the database "give me all changes since this revision", and add/remove those to the index.

If the backend crashes during this process, it's OK - it just means that we will add/remove the synced entries again when we restart the backend.