A100 not correctly detected / No available node types can fulfill resource request

Howdy, I am testing the anyscale/ray-llm docker container on a host with four A100 GPUs. When trying to deploy the codellama model (models/continuous_batching/codellama--CodeLlama-34b-Instruct-hf.yaml) it keeps complaining that Error: No available node types can fulfill resource request defaultdict(<class 'float'>, {'accelerator_type_a100_80g': 0.02, 'CPU': 9.0, 'GPU': 1.0}). Add suitable node types to this cluster to resolve this issue.

When checking the ray status I do see that the four GPUs are detected but I dont see any accelerator resource. Is this the problem?

======== Autoscaler status: 2024-01-31 07:57:02.899720 ========                                                                 
Node status                                                                                                                     
---------------------------------------------------------------                                                                 
Active:                                                                                                                         
 1 node_55aa130b55ddb6736217662992559285b0f12c93c4b2dfe3f0e41c7a                                                                
Pending:                                                                                                                        
 (no pending nodes)                                                                                                             
Recent failures:                                                                                                                
 (no failures)                                                                                                                  

Resources                                                                                                                       
---------------------------------------------------------------                                                                 
Usage:                                                          
 2.0/128.0 CPU                                                  
 0.0/4.0 GPU                                                    
 0B/483.94GiB memory                                                                                                            
 44B/9.73GiB object_store_memory                                

Demands:                                                        
 {'CPU': 1.0, 'accelerator_type_a100_80g': 0.01}: 1+ pending tasks/actors (1+ using placement groups)                                                                                                                                                            
 {'CPU': 1.0, 'accelerator_type_a100_80g': 0.01} * 1, {'accelerator_type_a100_80g': 0.01, 'GPU': 1.0, 'CPU': 8.0} * 1 (STRICT_PACK): 1+ pending placement groups

cuda and nvidia-smi correctly shows the cards within the container:

>>> torch.cuda.is_available()                                   
True                                                         
>>> torch.cuda.device(0)                                                                                                                                                                                                                                         <torch.cuda.device object at 0x7f64bdbb8ca0>
>>> torch.cuda.get_device_name()
NVIDIA A100-SXM4-80GB

The container is started as described in your README: docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/ray-llm:latest bash

Driver Version: 530.30.02
CUDA Version: 12.1

ray-project / ray-llm

A100 not correctly detected / No available node types can fulfill resource request #129