Foldseek jobs are not persistent from GUI

JackKay404 commented 5 months ago

Hello,

I'm trying to implement the foldseek docker-compose.yml as a docker swarm stack but have noticed that jobs which have been run are no longer accessible via the gui after downing and re-deploying the stack.

The previously run jobs are maintained in the jobs directory as expected and data is accessible via the command line, however in the job.json the status is "PENDING". The gui does seem to be aware of the contents of the jobs directory because tabs for old and new jobs are shown, but clicking on an old job returns "Job Status: ERROR Job failed. Please try again later.", even though the job was successful at the time of running. See screenshot below:

Any advice on this would be massively appreciated!

Edit: If I manually edit the job.json "status" field from "PENDING" to "COMPLETE" before restarting the containers then I can make the job persist from the gui, however this is not ideal. If I manually edit the job.json "status" field after restarting then the job is not persistent. Any suggestions on how to make the status update automatically?

milot-mirdita commented 5 months ago

The docker compose based server uses redis to temporary store information. If you shutdown the docker completely it will get rid of the redis volume and desync the job status from whats stored on disk.

We want to get rid of the redis based workflow, however that means that server and worker process can't be separated anymore. If that's okay with you please take a look at the docker-compose.local.yml, that replaces the commands to not use redis and use a single server process instead, here the source of truth is always the jobs directly on the file system.

JackKay404 commented 5 months ago

Thanks for the quick reply!

For my use case the redis based workflow is totally acceptable, so long as the data is made persistent with an external volume and accessible from the gui across container restarts.

I'm not sure exactly what wider impact this might have, but changing the command in mmseqs-web-api from "-server -config /etc/mmseqs-web/config.json -app ${APP}" to "-local -config /etc/mmseqs-web/config.json -app ${APP}" and also mounting an external volume to the /data directory of mmseqs-web-redis seems to have enabled persistent access to the jobs after down and up of the container stack.

My own docker-compose file for initiating a docker swarm stack looks as below:

version: '3.9'
services:
  mmseqs-web-redis:
    image: redis:alpine
    ports:
      - "${FOLDSEEK_REDIS_PORT}:6379"
    volumes:
      - mmseqs-web-redis:/data
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role==manager

  mmseqs-web-api:
    image: "ghcr.io/soedinglab/foldseek-app-backend:master"
    init: true
    command: -local -config /etc/mmseqs-web/config.json -app foldseek
    expose:
      - "3000"
    volumes:
      - ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
      - ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
      - ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role==manager

  mmseqs-web-worker:
    image: "ghcr.io/soedinglab/foldseek-app-backend:master"
    init: true
    command: -worker -config /etc/mmseqs-web/config.json -app foldseek
    volumes:
      - ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
      - ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
      - ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
    tmpfs:
      - ${FOLDSEEK_DIR}/tmp:exec
    environment:
      - MMSEQS_NUM_THREADS=1
    deploy:
      replicas: ${FOLDSEEK_WEB_WORKER_REPLICAS}
      placement:
        constraints:
          - node.role==manager

  mmseqs-web-webserver:
    image: "ghcr.io/soedinglab/foldseek-app-frontend:master"
    volumes:
      - ${FOLDSEEK_DIR}/nginx.conf:/etc/nginx/conf.d/default.conf:ro
    ports:
      - "${FOLSEEK_GUI_PORT}:80"
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role==manager

volumes: 
  mmseqs-web-redis:
    external: true
    name: mmseqs-web-redis

milot-mirdita commented 5 months ago

local disables redis and the server/worker split.

This is what I would suggest to use. As I think I am going to drop redis at some point anyway.

milot-mirdita commented 5 months ago

To clarify, both mmseqs-web-worker and mmseqs-web-redis don't do anything and can be disabled if you use -local.

-local still allows to use multiple workers as threads, it just doesn't allow to place them on different machines to the server.

JackKay404 commented 5 months ago

Thanks for the clarification! Yes, can confirm that I am able to remove the mmseqs-web-worker and mmseqs-web-redis containers from the compose and successfully persist jobs across multiple container restarts and after docker system prune. This is ideal for my purposes so thanks a lot for the great work!

docker-compose.yml now looks as below for anyone interested:

version: '3.9'
services:
  mmseqs-web-api:
    image: "ghcr.io/soedinglab/foldseek-app-backend:master"
    init: true
    command: -local.workers 1 -local -config /etc/mmseqs-web/config.json -app foldseek
    expose:
      - "3000"
    volumes:
      - ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
      - ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
      - ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role==manager

  mmseqs-web-webserver:
    image: "ghcr.io/soedinglab/foldseek-app-frontend:master"
    volumes:
      - ${FOLDSEEK_DIR}/nginx.conf:/etc/nginx/conf.d/default.conf:ro
    ports:
      - "${FOLSEEK_GUI_PORT}:80"
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role==manager

soedinglab / MMseqs2-App

Foldseek jobs are not persistent from GUI #90