josStorer / RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for commercial use.
https://www.rwkv.com
MIT License
5.25k stars 498 forks source link

feat(docker): add Docker support #291

Closed LonghronShen closed 8 months ago

LonghronShen commented 8 months ago

Changes

One more thing

FAQ

How to install nvidia-container-toolkit

If you want to use the CUDA strategy in Docker, you should install the nvidia-container-toolkit first. For example, this is the installation script for Ubuntu.

#!/bin/bash

set -x

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list

apt-get update && apt-get install -y nvidia-container-toolkit
systemctl restart docker
josStorer commented 8 months ago

I am currently travelling and will review this PR later

xiongsp commented 8 months ago

@LonghronShen It seems not working on my service. nvidia-smi works correctly in container and nvidia-container-toolkit shows correctly in my Ubuntu. It means I have installed the environment correctly but the backend uses CPU to work.

LonghronShen commented 8 months ago

Hi @xiongsp , have you tried the docker-compose.yml file which contains the gpu settings? Could you post the container log here for diagnostic? Plus, please also post your docker-compose.yml here~

xiongsp commented 8 months ago

Hi @LonghronShen , for some reasons I didn't use docker-compose instead I use docker run --name name -d -v ./models:/models -p 27777:27777 --gpus all images

Here is the log:


==========
== CUDA ==
==========

CUDA Version 11.6.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:27777 (Press CTRL+C to quit)
--- 0.754920482635498 seconds ---
torch found: /usr/local/lib/python3.10/dist-packages/torch/lib
torch set
Strategy Devices: {'cpu'}
state cache enabled
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 0

Loading /models/RWKV-x060-World-3B-v2-20240228-ctx4096.pth ...
Model detected: v6.0

BTW, this is nvidia-smi in container shows:

root@76f5f4497a74:/app# nvidia-smi
Thu Mar  7 13:16:57 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
| N/A   30C    P0    24W / 250W |     13MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   28C    P0    23W / 250W |     13MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
LonghronShen commented 8 months ago

Hi @xiongsp , According to the log, I think you invoked the switch-model web api with a wrong parameter. For a reference, you may try like this:

curl http://127.0.0.1:27777/switch-model -X POST -H "Content-Type: application/json" -d '{"model":"./models/RWKV-x060-World-3B-v2-20240228-ctx4096.pth","strategy":"cuda fp16","customCuda":"true","deploy":"true"}'

Note that the strategy should be cuda.

xiongsp commented 8 months ago

Thanks @LonghronShen ! It works! It may be difficult to add the curl into Dockerfile but a doc may be helpful. Thanks!