ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.22k stars 5.62k forks source link

[<Ray component: Core>] CUDA_VISIBLE_DEVICES not propagated properly #47319

Open paolovic opened 1 month ago

paolovic commented 1 month ago

What happened + What you expected to happen

Hi,

I have a deployments.yaml configured that looks like this

proxy_location: HeadOnly

http_options:
  host: 123.123.123.123
  port: 7007

grpc_options:
  port: 9000

  grpc_servicer_functions: []

applications:

- name: LLaMA 3

  route_prefix: /chat

  import_path: vllm_inf.vllm_serve:depl

  runtime_env:
    CUDA_VISIBLE_DEVICES: 0,1
    pip:
    - git+ssh://git@xxx:xxx/llms.git@branch/feature

  args:
    default_max_tokens: 4096
    model: /llama-3-70b-instruct-awq-main-4bit
    dtype: float16
    tensor_parallel_size: 2
    enforce_eager: True
    gpu_memory_utilization: 1

  - name: vLLMGenAPI
    num_replicas: 1
    ray_actor_options:
            num_gpus: 2
            num_cpus: 4

It calls the depl class inside vllm_inf.vllm_serve.py

When I log out os.environ["CUDA_VISIBLE_DEVICES"] inside that class, it is empty.

But instead, it should return "0,1", correct?

What am I doing wrong? I am using Ray together with vllm, and vllm fails now because of the empty CUDA_VISIBLE_DEVICES I am serving ray like the following

ray start ...
serve deploy deployments.yaml

Versions / Dependencies

ray[train,serve,tune,data]==2.34.0 python==3.11.9 vllm==0.5.5

Reproduction script

vllm_inf.vllm_serve.py

@serve.deployment(name="vLLMGenericAPI") @serve.ingress(app) class VLLMGenerateDeployment: def init(self, default_max_tokens: int, **kwargs): """ logger.info(f"CUDA_VISIBLE_DEVICES: {os.environ['CUDA_VISIBLE_DEVICES']}")

def depl(args: Dict[str, str]) -> Application: return VLLMGenerateDeployment.bind(**args)

Issue Severity

None

paolovic commented 4 weeks ago

I even get this error in my logs 2024-08-30 14:23:04,700 WARNING runtime_env_agent.py:322 -- runtime_env field CUDA_VISIBLE_DEVICES is not recognized by Ray and will be ignored. In the future, unrecognized fields in the runtime_env will raise an exception.