Closed JackKay404 closed 5 months ago
The docker compose based server uses redis to temporary store information. If you shutdown the docker completely it will get rid of the redis volume and desync the job status from whats stored on disk.
We want to get rid of the redis based workflow, however that means that server and worker process can't be separated anymore. If that's okay with you please take a look at the docker-compose.local.yml
, that replaces the commands to not use redis and use a single server process instead, here the source of truth is always the jobs directly on the file system.
Thanks for the quick reply!
For my use case the redis based workflow is totally acceptable, so long as the data is made persistent with an external volume and accessible from the gui across container restarts.
I'm not sure exactly what wider impact this might have, but changing the command in mmseqs-web-api from "-server -config /etc/mmseqs-web/config.json -app ${APP}" to "-local -config /etc/mmseqs-web/config.json -app ${APP}" and also mounting an external volume to the /data directory of mmseqs-web-redis seems to have enabled persistent access to the jobs after down and up of the container stack.
My own docker-compose file for initiating a docker swarm stack looks as below:
version: '3.9'
services:
mmseqs-web-redis:
image: redis:alpine
ports:
- "${FOLDSEEK_REDIS_PORT}:6379"
volumes:
- mmseqs-web-redis:/data
deploy:
replicas: 1
placement:
constraints:
- node.role==manager
mmseqs-web-api:
image: "ghcr.io/soedinglab/foldseek-app-backend:master"
init: true
command: -local -config /etc/mmseqs-web/config.json -app foldseek
expose:
- "3000"
volumes:
- ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
- ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
- ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
deploy:
replicas: 1
placement:
constraints:
- node.role==manager
mmseqs-web-worker:
image: "ghcr.io/soedinglab/foldseek-app-backend:master"
init: true
command: -worker -config /etc/mmseqs-web/config.json -app foldseek
volumes:
- ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
- ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
- ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
tmpfs:
- ${FOLDSEEK_DIR}/tmp:exec
environment:
- MMSEQS_NUM_THREADS=1
deploy:
replicas: ${FOLDSEEK_WEB_WORKER_REPLICAS}
placement:
constraints:
- node.role==manager
mmseqs-web-webserver:
image: "ghcr.io/soedinglab/foldseek-app-frontend:master"
volumes:
- ${FOLDSEEK_DIR}/nginx.conf:/etc/nginx/conf.d/default.conf:ro
ports:
- "${FOLSEEK_GUI_PORT}:80"
deploy:
replicas: 1
placement:
constraints:
- node.role==manager
volumes:
mmseqs-web-redis:
external: true
name: mmseqs-web-redis
local disables redis and the server/worker split.
This is what I would suggest to use. As I think I am going to drop redis at some point anyway.
To clarify, both mmseqs-web-worker
and mmseqs-web-redis
don't do anything and can be disabled if you use -local
.
-local
still allows to use multiple workers as threads, it just doesn't allow to place them on different machines to the server.
Thanks for the clarification! Yes, can confirm that I am able to remove the mmseqs-web-worker
and mmseqs-web-redis
containers from the compose and successfully persist jobs across multiple container restarts and after docker system prune
. This is ideal for my purposes so thanks a lot for the great work!
docker-compose.yml now looks as below for anyone interested:
version: '3.9'
services:
mmseqs-web-api:
image: "ghcr.io/soedinglab/foldseek-app-backend:master"
init: true
command: -local.workers 1 -local -config /etc/mmseqs-web/config.json -app foldseek
expose:
- "3000"
volumes:
- ${FOLDSEEK_DIR}/config.json:/etc/mmseqs-web/config.json:ro
- ${FOLDSEEK_DB_PATH}:/opt/mmseqs-web/databases
- ${FOLDSEEK_JOBS_PATH}:/opt/mmseqs-web/jobs
deploy:
replicas: 1
placement:
constraints:
- node.role==manager
mmseqs-web-webserver:
image: "ghcr.io/soedinglab/foldseek-app-frontend:master"
volumes:
- ${FOLDSEEK_DIR}/nginx.conf:/etc/nginx/conf.d/default.conf:ro
ports:
- "${FOLSEEK_GUI_PORT}:80"
deploy:
replicas: 1
placement:
constraints:
- node.role==manager
Hello,
I'm trying to implement the foldseek docker-compose.yml as a docker swarm stack but have noticed that jobs which have been run are no longer accessible via the gui after downing and re-deploying the stack.
The previously run jobs are maintained in the jobs directory as expected and data is accessible via the command line, however in the job.json the status is "PENDING". The gui does seem to be aware of the contents of the jobs directory because tabs for old and new jobs are shown, but clicking on an old job returns "Job Status: ERROR Job failed. Please try again later.", even though the job was successful at the time of running. See screenshot below:
Any advice on this would be massively appreciated!
Edit: If I manually edit the job.json "status" field from "PENDING" to "COMPLETE" before restarting the containers then I can make the job persist from the gui, however this is not ideal. If I manually edit the job.json "status" field after restarting then the job is not persistent. Any suggestions on how to make the status update automatically?