ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 89 forks source link

aviary run --model failing to deploy #26

Open mahaddad opened 1 year ago

mahaddad commented 1 year ago

Hi Aviary team,

Thanks for the great package. I am trying to get it to work for my use case and I am running into several issues. Details are provided below. Let me know if I can provide any additional information to help identify the root cause.

  1. When deploying new models the deployment will sometimes hang for over an hour before it silently fails.
  2. Unable to kill that individual serve application which means I must restart the entire cluster to try again to deploy that model
  3. aviary models shows models that are not available to be queried and does not display others that are available

Using the latest docker image and default deploy/ray/aviary-cluster.yaml with the following change:

gpu_worker_g5: node_config: InstanceType: g5.4xlarge BlockDeviceMappings: *mount resources: worker_node: 1 instance_type_g5: 1 accelerator_type_a10: 1 min_workers: 0 max_workers: 8

When I run export AVIARY_URL="http://localhost:8000" aviary run --model ./models/static_batching/mosaicml--mpt-7b-instruct.yaml aviary run --model ./models/static_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml

Falcon-7b deploys successfully, but mpt-7b-instruct never deploys and just hangs for about an hour until it says failed. If I retry same result. If I try a different model same result. I am well below the vCPU quota on G Instances. I also tried vicuna13b and that also failed to launch a GPU instance.

image

Also aviary models shows the model running although it is not. For some reason falcon-7b is not shown but it actually is running. If you ping /-/routes directly then you see both models running. Expected behavior would be that only running models available to be queried are shown when you call aviary models. (base) ray@ip-172-31-52-1:~$ aviary models Connecting to Aviary backend at: http://localhost:8000/ mosaicml/mpt-7b-instruct

(base) ray@ip-172-31-52-1:~$ ray list actors --detail

Yard1 commented 1 year ago

Thanks, will try to reproduce and get back to you!

In regards to aviary models, unfortunately it would be non-trivial to make it only return already deployed models. The fact it is not showing new models, however, is an issue.

Yard1 commented 1 year ago

Also, one other thing that would be helpful would be the autoscaler logs (on the head node: /tmp/ray/session_latest/logs/monitor.log and monitor.err). We have had trouble provisioning g5 nodes ourselves, and I think this may be a simple AWS capacity issue.

I am also happy to schedule some time on Friday to debug together! Let me know if you are interested.

Yard1 commented 1 year ago

Finally, I would recommend deploying with multiple models specified at once, instead of calling aviary run several times - aviary run --model MODEL1 --model MODEL2.

mahaddad commented 1 year ago

I would love the chance to debug together. I can make myself available anytime on Friday that works best for you. Can you shoot me a note at michael@konko.ai ?

In the mean time, I will try running aviary run with multiple models as you suggested and perform some further testing to capture the logs you mentioned.