loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
37 stars 2 forks source link

Investigate adding a distributed lock for ingest #2843

Closed anna-parker closed 1 month ago

anna-parker commented 1 month ago

We saw some suspicious duplication of ingest data again in preview instances (see https://github.com/loculus-project/loculus/pull/2653). We saw this before due to concurrency of ingest jobs. We thought that adding

replicas: 1
  strategy:
    type: Recreate

to the ingest deployment would prevent concurrency but maybe this isn't enough.

Some ideas how we could add a distributed lock:

  1. Redis with Redlock: Pods can acquire a distributed lock from Redis before communicating with another pod.
  2. Add a lock endpoint to the backend which would make the backend add a lock to the backend, sth along the lines of:
    • Add a lock table: with username (maybe also PID) and time of lock
    • Add a lock endpoint that adds an entry to the lock table for that user and otherwise returns an error if there is already an entry in the lock table for that user
    • Have ingest call the lock endpoint before calling get-original-metadata. Only continue with submission if the call is successful.
    • Have the backend clean up the lock table by removing old locks, e.g. older than 30min
theosanderson commented 1 month ago

I wouldn't necessarily proceed on this before we understand the issue by e.g. seeing the concurrency in logs

theosanderson commented 1 month ago

We identified a likely cause of duplicate ingest (resolved by https://github.com/loculus-project/loculus/pull/2846). We can likely close this Issue without action if issues do not resurface in next few weeks.

theosanderson commented 1 month ago

We now have even more understanding of duplicate ingest (due to the reloader). Would you be ok for this to be closed?