Closed ApoorveK closed 1 year ago
You yaml file doesn't properly split the arguments.
"--model-id", "bigscience/bloom-560m"
is what you want to send to you process.
Also this is NOT an officially supported command, please tick the correct box next time :)
Thank you @Narsil for suggestion and I will keep it in mind. Can you also suggest way to add shm-size for services which i am unable give with the help of following docker swarm file.
version: "3.8"
services:
llm_bloom:
# build:
# context: .
# args:
# model: bigscience/bloom-560m
# num_shard: 2
image: ghcr.io/huggingface/text-generation-inference:0.8
ports:
- 8089:80
volumes:
- type: volume
source: mydata
target: /data
command: ["--shm-size","1g","--model-id","bigscience/bloom-560m","--disable-custom-kernels","--num_shard","2","--max-concurrent-requests","128"]
deploy:
replicas: 1
llm_bloom_quantized:
# build:
# context: .
# args:
# model: bigscience/bloom-560m
# num_shard: 2
image: ghcr.io/huggingface/text-generation-inference:0.8
ports:
- 8099:80
volumes:
- type: volume
source: mydata
target: /data
command: ["--shm-size","1g","--model-id","bigscience/bloom-560m","--disable-custom-kernels","--num_shard","2","--max-concurrent-requests","128","--quantize","bitsandbytes"]
# [possible values: bitsandbytes, gptq]
deploy:
replicas: 1
volumes:
mydata:
# volumes:
# - type: tmpfs
# target: /dev/shm
# tmpfs:
# size: 4096000000 # (this means 4GB)
# shm:
# driver: local
# driver_opts:
# type: tmpfs
# mount_options:
# - size="4G"
# - type: tmpfs
# target: /dev/shm
# tmpfs:
# size: 4096000000 # (this means 4GB)
Sorry I never used swarm.
System Info
Currently running trying to run the server in docker format with CPU support (with --disable-custom-kernels) with default model (bigscience/bloom-560m). the server is working smoothly in single docker container basis which is accessible through text-generation package (python) as per given documentation. So the plan was to deploy docker swarm having multiple instances of server with different models (kind of centralised LLM server)
Information
Tasks
Reproduction
Currently trying to deploy the docker swarm with text-generation-inference as service, using following docker compose yaml file: dockerSwarm.txt {using .txt format but you can rename it, changing last extension with .yml}
And using following commands to start the docker stack:
docker swarm init --advertise-addr 127.0.0.1
docker stack deploy -c dockerSwarm.yml llm_server
and getting following logs inside the docker service using following command:
Output:
Expected behavior
Docker swarm should start the docker services where currently 2 text-generation-inference servers are being deployed.