Closed mymusise closed 3 weeks ago
Sorry guy, there was a small issue with the previous instructions (fixed). If you get stuck on docker model download, you can try adding a proxy. Once the service starts successfully, there will be a Connected log.
Alternatively, if you have used AutoModelForCausalLM.from_pretrained
to obtain the model locally before, the model files should be cached locally. You can try mounting the cache directory to the Docker container.
For example, on my Linux machine, the original Hugging Face cache path is /home/ubuntu/.cache/huggingface/hub
.
port=8080
modelID=lllyasviel/omost-llama-3-8b
memoryRate=0.9
volume=/home/ubuntu/.cache/huggingface/hub
docker run --gpus all -p $port:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:2.0.4 \
--model-id $modelID --max-total-tokens 9216 --cuda-memory-fraction $memoryRate
Then, we can get the response with curl:
(base) ➜ curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"What is Deep Omost?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
{"generated_text":" Deep Omost is a comprehensive, non-invasive, and evidence-based treatment approach that targets the root"}%
I feel like this feature need to be better documented.
sure! let me detail it
I feel like this feature need to be better documented.
sure! let me detail it
Hi @huchenlei, I have updated the usage in the README. Please try again, feel free to let me know if there are any issues.
Thanks for addressing that! I will give it a test tomorrow
Hi, this modification adds the capability to use external LLM services, such as deploying LLM with TGI to accelerate inference. In my tests, there is a 6x speed improvement on the H100, and on the A10g, the average response time is only 50 seconds.
For example:
First, we can deploy the LLM using TGI through Docker:
Then, test if the LLM service has successfully started.
Next, add an![image](https://github.com/huchenlei/ComfyUI_omost/assets/6883957/8cf1f3a8-f4d7-416c-a1d0-be27bc300c96)
Omost LLM HTTP Server
node and enter the service address of the LLM.