ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.33k stars 5.64k forks source link

Ray Core - Cluster Up Head Node Callback #40906

Open SoundsSerious opened 11 months ago

SoundsSerious commented 11 months ago

Description

I would like to setup an elastic IP that is linked to a domain name when I launch a ray cluster, similar to issue #7446

I have been thinking of a way to do this and think it might be best if the ray cluster up command contains a callback script option that is fed the IP address of the head node, that the user terminal is connected to vs the node the cluster reports.

Something that would be easy to incorporate would be ray cluster up --head-node-callback ./path/to/script

Ie in the cluster up message the cluster reports on init To submit a Ray job using the Ray Jobs CLI: RAY_ADDRESS='http://172.31.92.36:8265'.... then goes onto say Shared connection to 18.208.106.9 closed.

We would want a callback with head node information perhaps specific to the cloud service used so that further install actions could be taken. For AWS we would only need the VPC allocation and the instance id as per this example

import boto3
from botocore.exceptions import ClientError

ec2 = boto3.client('ec2')

try:
    allocation = ec2.allocate_address(Domain='vpc')
    response = ec2.associate_address(AllocationId=allocation['AllocationId'],
                                     InstanceId='INSTANCE_ID')
    print(response)
except ClientError as e:
    print(e)

Use case

No response

rkooo567 commented 11 months ago

As a workaround, is it viable to just run ./path/to/script inside the https://github.com/ray-project/ray/blob/5ab8997a3a9f546d52c949e947feddb08bdc630d/python/ray/autoscaler/aws/defaults.yaml#L123?