ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 91 forks source link

Podman Error on red hat 9? #127

Open jayteaftw opened 8 months ago

jayteaftw commented 8 months ago

I am trying to deploy rayLLM locally, and these are the commands I am running

cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

podman run -it  --device nvidia.com/gpu=0  --security-opt=label=disable --shm-size 20g -p 8000:8000 -e HF_HOME=/home/ray/data -v $cache_dir:/home/ray/data  anyscale/ray-llm:latest bash

# Inside docker container
serve run ~/serve_configs/amazon--LightGPT.yaml

However, when I run the serve command, I will get 2024-01-24 16:34:15,360 INFO scripts.py:411 -- Running config file: '/home/ray/serve_configs/amazon--LightGPT.yaml'. 2024-01-24 16:34:18,947 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265

And it will hang for a bit and not create a Ray instance. I am running this on RedHat9, and I know the GPU is connect because I can run nvidia-smi.

Interestingly, I cannot cd into data when in the pod as it says I need sudo permissions. Could this be part of the problem?