aviary run --model failing to deploy

Hi Aviary team,

Thanks for the great package. I am trying to get it to work for my use case and I am running into several issues. Details are provided below. Let me know if I can provide any additional information to help identify the root cause.

When deploying new models the deployment will sometimes hang for over an hour before it silently fails.
Unable to kill that individual serve application which means I must restart the entire cluster to try again to deploy that model
aviary models shows models that are not available to be queried and does not display others that are available

Using the latest docker image and default deploy/ray/aviary-cluster.yaml with the following change:

gpu_worker_g5: node_config: InstanceType: g5.4xlarge BlockDeviceMappings: *mount resources: worker_node: 1 instance_type_g5: 1 accelerator_type_a10: 1 min_workers: 0 max_workers: 8

When I run export AVIARY_URL="http://localhost:8000" aviary run --model ./models/static_batching/mosaicml--mpt-7b-instruct.yaml aviary run --model ./models/static_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml

Falcon-7b deploys successfully, but mpt-7b-instruct never deploys and just hangs for about an hour until it says failed. If I retry same result. If I try a different model same result. I am well below the vCPU quota on G Instances. I also tried vicuna13b and that also failed to launch a GPU instance.

Also aviary models shows the model running although it is not. For some reason falcon-7b is not shown but it actually is running. If you ping /-/routes directly then you see both models running. Expected behavior would be that only running models available to be queried are shown when you call aviary models. (base) ray@ip-172-31-52-1:~$ aviary models Connecting to Aviary backend at: http://localhost:8000/ mosaicml/mpt-7b-instruct

(base) ray@ip-172-31-52-1:~$ ray list actors --detail

actor_id: 014519cc10a3c1393952282303000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#HoiArw node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 578 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 578 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#HoiArw ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: 014519cc10a3c1393952282303000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: 02f2b6bd4c9a00b6b81b4c2503000000 class_name: HTTPProxyActor state: ALIVE job_id: '03000000' name: SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 158 ray_namespace: serve serialized_runtime_env: '{}' required_resources: {} death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: 10622e4649179965e5a6d0c303000000 class_name: ServeReplica:OpenAssistant--falcon-7b-sft-top1-696_OpenAssistant--falcon-7b-sft-top1-696 state: ALIVE job_id: '03000000' name: SERVE_REPLICA::OpenAssistant--falcon-7b-sft-top1-696_OpenAssistant--falcon-7b-sft-top1-696#pRMsbX node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 157 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: 37b968e0c1adc19c2317963c03000000 class_name: ServeController state: ALIVE job_id: '03000000' name: SERVE_CONTROLLER_ACTOR node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 547 ray_namespace: serve serialized_runtime_env: '{}' required_resources: node:__internal_head__: 0.001 death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: 3e24b48d34bd94402134c1f403000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZJCwbG node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 748 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 748 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZJCwbG ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: 3e24b48d34bd94402134c1f403000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: 3e28e1cacda268b21faa5c7503000000 class_name: HTTPProxyActor state: ALIVE job_id: '03000000' name: SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 572 ray_namespace: serve serialized_runtime_env: '{}' required_resources: {} death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: 642421606c9735b2b038281503000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZHeJeU node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 476 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 476 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZHeJeU ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: 642421606c9735b2b038281503000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: 76073afc503c62fb0fc0c2a303000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#PiHekO node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 714 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 714 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#PiHekO ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: 76073afc503c62fb0fc0c2a303000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: 81de79edf7d71d28f866ed4a03000000 class_name: PredictionWorker state: ALIVE job_id: '03000000' name: '' node_id: cf564dfb2977f9931871554534b14ef561487e1dfbd0082bcd9ea19d pid: 315 ray_namespace: serve serialized_runtime_env: '{"env_vars": {"PYTORCH_CUDA_ALLOC_CONF": "backend:cudaMallocAsync"}}' required_resources: accelerator_type_a10_group_a1eecb5ff05974e7cfd257634e0903000000: 0.01 GPU_group_a1eecb5ff05974e7cfd257634e0903000000: 1.0 CPU_group_a1eecb5ff05974e7cfd257634e0903000000: 8.0 death_cause: null is_detached: false placement_group_id: a1eecb5ff05974e7cfd257634e0903000000 repr_name: PredictionWorker:OpenAssistant/falcon-7b-sft-top1-696
actor_id: 873651bc5c6bdf4494c063bc03000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: ALIVE job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#zXQasq node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 782 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: 99f4b1e5b33343787c6ebbda03000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#dgCkoa node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 371 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 371 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#dgCkoa ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: 99f4b1e5b33343787c6ebbda03000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: 9ed8f9167aff1416f9b40fb903000000 class_name: ServeReplica:router_Router state: DEAD job_id: '03000000' name: SERVE_REPLICA::router_Router#SnpzWO node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 688 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.52.1 pid: 688 name: SERVE_REPLICA::router_Router#SnpzWO ray_namespace: serve class_name: ServeReplica:router_Router actor_id: 9ed8f9167aff1416f9b40fb903000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: aa085f6ae7a2de244227f74b03000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NUJehq node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 680 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 680 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NUJehq ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: aa085f6ae7a2de244227f74b03000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: bdefd5bdada986dc6a86c20f03000000 class_name: ServeReplica:router_Router state: DEAD job_id: '03000000' name: SERVE_REPLICA::router_Router#YVeHIq node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 601 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.52.1 pid: 601 name: SERVE_REPLICA::router_Router#YVeHIq ray_namespace: serve class_name: ServeReplica:router_Router actor_id: bdefd5bdada986dc6a86c20f03000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: bfff95555c8aad324470ffd303000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#JlnvAU node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 337 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 337 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#JlnvAU ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: bfff95555c8aad324470ffd303000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: c9fa37906bdebd1f2bd16b0b03000000 class_name: HTTPProxyActor state: ALIVE job_id: '03000000' name: SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-cf564dfb2977f9931871554534b14ef561487e1dfbd0082bcd9ea19d node_id: cf564dfb2977f9931871554534b14ef561487e1dfbd0082bcd9ea19d pid: 163 ray_namespace: serve serialized_runtime_env: '{}' required_resources: {} death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: cad3c4d91c30b987dd98e33203000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#DGeusc node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 405 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 405 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#DGeusc ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: cad3c4d91c30b987dd98e33203000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: ce3265064d0a2c858b25fde703000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NeMVAn node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 510 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 510 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NeMVAn ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: ce3265064d0a2c858b25fde703000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: d26865451e8634b713fe64a903000000 class_name: ServeReplica:router_Router state: ALIVE job_id: '03000000' name: SERVE_REPLICA::router_Router#yKFoKB node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 930 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 death_cause: null is_detached: true placement_group_id: null repr_name: ''
actor_id: dfe8997fd860700452da4ad103000000 class_name: ServeReplica:router_Router state: DEAD job_id: '03000000' name: SERVE_REPLICA::router_Router#PzsVij node_id: 3a928be2a77f07dd58f2ae3672f853f3c3a0e341995289bcbca66d35 pid: 857 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.52.1 pid: 857 name: SERVE_REPLICA::router_Router#PzsVij ray_namespace: serve class_name: ServeReplica:router_Router actor_id: dfe8997fd860700452da4ad103000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: e1ea02ba81a4f7f6704fd01603000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#oVTtwj node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 439 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 439 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#oVTtwj ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: e1ea02ba81a4f7f6704fd01603000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: e6460cb5a1bd9c9875d71ffa03000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZiPkFw node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 544 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 544 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#ZiPkFw ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: e6460cb5a1bd9c9875d71ffa03000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: e8db0e9b51037e8f85f03fec03000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#qObhMp node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 646 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 646 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#qObhMp ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: e8db0e9b51037e8f85f03fec03000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: eeeb659a56054345b32fc10503000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#XrHvwh node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 156 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 156 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#XrHvwh ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: eeeb659a56054345b32fc10503000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: f85cc894e7c8b87a37a3da9203000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NGyLdh node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 303 ray_namespace: serve serialized_runtime_env: '{}' required_resources: CPU: 1.0 accelerator_type_cpu: 0.01 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 303 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#NGyLdh ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: f85cc894e7c8b87a37a3da9203000000 never_started: false is_detached: true placement_group_id: null repr_name: ''
actor_id: fff57bdd81a5b9190538fcd003000000 class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct state: DEAD job_id: '03000000' name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#EYOIDW node_id: 700b06906e2d611398d2a1c72140294f3cf773ece7a801c95520f669 pid: 612 ray_namespace: serve serialized_runtime_env: '{}' required_resources: accelerator_type_cpu: 0.01 CPU: 1.0 death_cause: actor_died_error_context: error_message: The actor is dead because it was killed by ray.kill. owner_id: 7e36611f04a1dd7649752f6ca82dbfb2d75ebdb4370831b8fc9c446c owner_ip_address: 172.31.52.1 node_ip_address: 172.31.50.34 pid: 612 name: SERVE_REPLICA::mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct#EYOIDW ray_namespace: serve class_name: ServeReplica:mosaicml--mpt-7b-instruct_mosaicml--mpt-7b-instruct actor_id: fff57bdd81a5b9190538fcd003000000 never_started: false is_detached: true placement_group_id: null repr_name: '' ...

ray-project / ray-llm

aviary run --model failing to deploy #26

(base) ray@ip-172-31-52-1:~$ ray list actors --detail