Closed mahaddad closed 1 year ago
Thanks, we'll look into that! Is this a problem with both images or just the tgi
image?
I have only tried anyscale/aviary:latest.
I also tried anyscale/aviary:0.0.2-ac62571102ddd7d588da27c2aaff6e0454af8c61 and that worked.
Thanks, we should have a fix shortly.
Thanks, @Yard1 ! Assuming the fix is a new docker image, how can I know automatically once the aviary:latest has been updated? Also happy to test and let you know if it's working on my end.
@mahaddad we'll make a new release in the github repo!
@mahaddad I've uploaded a test image, anyscale/aviary:test
. Can you see if that works for you?
Ray dashboard is now working as expected. Thank you!
However, when I run aviary run --model ./models/static_batching/mosaicml--mpt-7b-instruct.yaml
the model never successfully deploys. I was able to get this model to deploy on the other image I mentioned in my original post.
Will take a look
@mahaddad I have updated the test docker image. Can you try again with the following EC2 config?
# An unique identifier for the head node and workers of this cluster.
cluster_name: aviary-deploy
# Cloud-provider specific configuration.
provider:
type: aws
region: us-west-2
cache_stopped_nodes: False
docker:
image: "anyscale/aviary:test"
# Use this image instead for continuous batching:
# image: "anyscale/aviary:latest-tgi"
container_name: "aviary"
run_options:
- --entrypoint ""
setup_commands:
- which ray || pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
worker_start_ray_commands:
- ray stop
# We need to make sure RAY_HEAD_IP env var is accessible.
- export RAY_HEAD_IP && echo "export RAY_HEAD_IP=$RAY_HEAD_IP" >> ~/.bashrc && ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
available_node_types:
head_node_type:
node_config:
InstanceType: m5.xlarge
BlockDeviceMappings: &mount
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 256
VolumeType: gp3
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 256
VolumeType: gp3
resources:
head_node: 1
instance_type_m5: 1
gpu_worker_g5:
node_config:
InstanceType: g5.12xlarge
BlockDeviceMappings: *mount
resources:
worker_node: 1
instance_type_g5: 1
accelerator_type_a10: 1
min_workers: 0
max_workers: 8
gpu_worker_p3:
node_config:
InstanceType: p3.8xlarge
BlockDeviceMappings: *mount
resources:
worker_node: 1
instance_type_p3: 1
accelerator_type_v100: 1
min_workers: 0
max_workers: 4
cpu_worker:
node_config:
InstanceType: m5.xlarge
BlockDeviceMappings: *mount
resources:
worker_node: 1
instance_type_m5: 1
accelerator_type_cpu: 1
min_workers: 0
max_workers: 16
head_node_type: head_node_type
This worked. Thank you for the quick turn around on this! Out of curiosity, what was the root cause and fix?
One of the dependencies required for the dashboard somehow become missing as the dockerfile was updated (most likely due to changes to conda setup). Then, the model was failing as it was unable to download files from our S3 mirror due to an update to boto3/awscli requiring additional argument.
All of those changes will be reflected on master today and the latest docker will be updated.
Fixed in 0.1.1
Thanks for the great project!
With the exact same setup and steps, Ray dashboard connects with image:anyscale/aviary:0.1.0-a98a94c5005525545b9ea0a5b0b7b22f25f322d7-tgi
Happy to provide logs if you let me know which ones would be helpful and how to generate them.
Appreciate you guys taking a look at this.