ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 91 forks source link

Recently updated to 0.5.0 and can no longer deploy models -- add suitable Node Error #120

Closed JGSweets closed 8 months ago

JGSweets commented 8 months ago

When trying to launch any model via serve, ray is now throwing

Error: No available node types can fulfill resource request 
defaultdict(<class 'float'>, {'accelerator_type_a10': 0.02, 'CPU': 9.0, 'GPU': 1.0}). 
Add suitable node types to this cluster to resolve this issue.
  gpu_worker_g5:
    node_config:
      InstanceType: g5.12xlarge
      BlockDeviceMappings: *mount
    resources:
      worker_node: 1
      instance_type_g5: 1
      accelerator_type_a10: 1
    min_workers: 0
    max_workers: 4

I'm not sure why ray is not requesting resources via the autoscale config anymore to fix said issue.

JGSweets commented 8 months ago

It seems it is required to add the CPU / GPU resources.

  resources:
    worker_node: 1
    instance_type_g5: 1
    accelerator_type_a10: 1
    CPU: 64
    GPU: 4
sihanwang41 commented 8 months ago

Hi @JGSweets , looks like you have already fixed the issue by yourself. (We are going to clean up the config, so that user can directly use it without learning the accelerator concept.)