smallcloudai / refact

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding
https://refact.ai
BSD 3-Clause "New" or "Revised" License
1.57k stars 104 forks source link

Running on Fedora 38 - Docs Update #39

Open mrhillsman opened 1 year ago

mrhillsman commented 1 year ago

When wanting self-hosted we are told to visit https://refact.ai/docs/self-hosting/ and run docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting after ensuring we have docker with nvidia gpu support. Unfortunately these instructions do not work for me while I was able to run the previous release of refact before the significant changes just made. Here is what I was getting when following those instructions:

 -- 26 -- WARNING:root:output was:
-- 26 -- - no output -
-- 26 -- WARNING:root:nvidia-smi does not work, that's especially bad for initial setup.
-- 26 -- WARNING:root:Traceback (most recent call last):
-- 26 --   File "/usr/local/lib/python3.8/dist-packages/self_hosting_machinery/scripts/enum_gpus.py", line 17, in query_nvidia_smi
-- 26 --     nvidia_smi_output = subprocess.check_output([
-- 26 --   File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
-- 26 --     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-- 26 --   File "/usr/lib/python3.8/subprocess.py", line 516, in run
-- 26 --     raise CalledProcessError(retcode, process.args,
-- 26 -- subprocess.CalledProcessError: Command '['nvidia-smi', '--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu', '--format=csv']' returned non-zero exit status 4.
-- 26 -- 

I can confirm however that running the enum_gpus.py by importing into python (tested 3.8 - which is in the Dockerfile, and 3.11) the function query_nvidia_smi succeeds. Additionally running the nvidia-smi command and flags from enum_gpus succeed:

(refact) [mrhillsman@workstation refact]$ python --version
Python 3.8.17
(refact) [mrhillsman@workstation refact]$ python
Python 3.8.17 (default, Jun  8 2023, 00:00:00) 
[GCC 13.1.1 20230511 (Red Hat 13.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.check_output(["nvidia-smi", "--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu", "--format=csv"])
b'pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu\n00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29\n'
>>> import self_hosting_machinery.scripts.enum_gpus as gpuenum
>>> gpuenum.query_nvidia_smi()
{'gpus': [{'id': '00000000:01:00.0', 'name': 'NVIDIA GeForce RTX 3080', 'mem_used_mb': 11, 'mem_total_mb': 10240, 'temp_celsius': 29}]}
>>> exit()
(refact) [mrhillsman@workstation refact]$ nvidia-smi --query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu --format=csv
pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu
00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29
(refact) [mrhillsman@workstation refact]$ nvidia-smi 
Sat Jul 22 15:46:59 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   29C    P8              13W / 370W |     11MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2727      G   /usr/bin/gnome-shell                          3MiB |
+---------------------------------------------------------------------------------------+
(refact) [mrhillsman@workstation refact]$ uname -a
Linux workstation 6.3.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul  6 04:05:18 UTC 2023 x86_64 GNU/Linux
(refact) [mrhillsman@workstation refact]$ cat /etc/os-release 
NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Workstation Edition"
VARIANT_ID=workstation
[mrhillsman@workstation refact-ai]$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

I would have created a PR for the documentation change but I do not see a repo for the site documentation. Here is what I was able to run and have work which I am recommending be added to the documentation somehow either under Fedora38 specifically or RPM based OSs in general:

podman run -d -it --gpus 0 --security-opt=label=disable -p 8008:8008 -v perm_storage:/perm_storage smallcloud/refact_self_hosting

olegklimov commented 1 year ago

Thanks for reporting!

olegklimov commented 1 year ago

We have docs repository

https://github.com/smallcloudai/web_docs_refact_ai

mrhillsman commented 8 months ago

thx @olegklimov will submit a PR soon there apologies for the delay. once i get an open PR/issue there i'll reference here and close this issue.