Open okyx opened 1 year ago
has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {"accelerator_type_cpu": 0.01, "CPU": 1}, resources available: {"CPU": 15.0}
Aviary uses Ray custom resources (eg. accelerator_type_cpu
) for scheduling. It appears that the cluster you are running Aviary on doesn't have them. You can either configure them to be visible in your cluster, or remove them from the model configuration YAMLs.
thanks for the answer @Yard1 , but i tried to reconfigure the log change into this amazon--LightGPT_amazon--LightGPT has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow init or reconfigure method
is it okay if my cluster doesnt have gpu?
Most of the models require GPUs. You may try to use llama.cpp backend with CPU. See https://github.com/ray-project/aviary/blob/master/models/static_batching/eachadea--ggml-vicuna-13b-1.1.yaml for an example. Make sure to remove custom resources (two instances of accelerator_type_cpu
).
the log keep saying "has 1 replicas that have taken more than 30s to initialize"