Closed jieru-hu closed 3 years ago
Hi, @jieru-hu ! Thanks for the feature request! Can I ask you a couple questions?
ray submit
?This is another ask for Ray client @anabranch
Hi, @jieru-hu ! Thanks for the feature request! Can I ask you a couple questions?
Hi @rkooo567 so sorry for the late response!
- What's the use case of this feature?
Ray autoscaler is great. But it requires users to sync our code manually and then call ray init on the cluster. Being able to call ray.init from a local machine seems a much better experience.
- Is there any reason why you prefer this way to
ray submit
?
This kind aligns with the above question. Right now I'm using the autoscaler CLI in my code, but I'm hoping Ray could provide us more coding friendly interface for interacting with remote cluster.
- This can actually impact performance of Ray because depending on contexts, there are several IPCs to execute tasks remotely. Would your use case be able to tolerate the performance degradation?
In my use case, my remote machine does not do the compute. It's just where we submit the job to ray (by calling ray.init()), all the computation is done remotely. In this scenario, will the preformance still be impacted?
Let me know if I understood you correctly! So you are saying you want to ray.init() on your local machine, and it will only run function.remote() (which is compute heavy) to the ray cluster. Am I correct? So, you want to use it like a task queue?
Please have a look at this. It would be really useful and streamline many use cases.
@Vysybyl it might take some time to have fully working version of this feature, but we are moving towards having a gradual support to this feature!
Thanks for the quick reply, I'm looking forward to it! (In my case I'd like to scale up some interactive computation on jupyter while keeping the code and data files on my local machine. I'm more comfortable NOT copying these files to the remote servers hard-drive, but rather sending only data that will be kept in memory.) I have poked around GH issues trying several workarounds, but none worked. I think I'll setup a VPN for the time being and use my local machine as a 0-resource node.
Maybe I can help with this task? If I understand correctly the main issue is that Ray uses many services on different ports (including random ones) on each node, and nodes must be able to communicate with each other on (many) of these ports, which is unlikely to happen if the machines are not on the same LAN or VPN. Probably the solution would be to have all communication between client (driver) and head node to happen via a single endpoint (which could be opened via SSH port-forwarding, for instance) so that the head node acts like a proxy.
Am I correct? Is there any way I can contribute to this?
Thanks
I think I'll setup a VPN for the time being and use my local machine as a 0-resource node.
Yes. I guess it is the best solution for now. The prototype we are currently working on will also work only when bi-directional communication is allowed between your machine and Ray clusters.
head node acts like a proxy.
The biggest problem here is that Ray's strongly tied to the model where it is running fully within a cluster. That says Ray workers need bi-directional communication (meaning the node with a worker should be able to receive request from other nodes). So, what you mentioned could solve some parts of the problem, but the fundamental problem still remains.
We are also planning to have public roadmap in a few months, so you can definitely check the progress in the sooner future!
Hi! I was able to setup a ray cluster on GCS and connecting to it directly from my local python process (jupyter or shell) without SSH-ing to the remote machine.
In short:
I may post here a more detailed recipe if that could be interesting for anyone
Cheers
I'd love to see the proof of concept for this! Do you have that recipe that you could share?
Cc @barakmich
Here's the recipe:
I followed the instructions here. I have modified a few lines in the original yaml file, namely:
a) Installing the latest version form pypi (which is what I use locally), not the wheel from AWS S3
pip install ray
b) Adding a rayvpn network tag to the head node that will later be used to match a firewall rule
...
head_node:
machineType: n1-standard-2
tags:
items:
# This tag is used to allow VPN connection (it matches a GCS Firewall rule)
- rayvpn
...
After calling ray up
please be sure to take note of the head node IP and reddis password from the console logs.
Create a Firewall Rule on GCS as explained here. Go straight to the "Firewall Rules" part.
SSH to the head node to install OpenVPN (depending on your gcloud settings, you may need to specify the zone, etc.).
export RAY_HEAD=<HEAD_VM_HOSTNAME>
gcloud compute ssh ubuntu@$RAY_HEAD
sudo su
cd /home/ubuntu
curl -O https://raw.githubusercontent.com/angristan/openvpn-install/master/openvpn-install.sh
chmod +x openvpn-install.sh
AUTO_INSTALL=y APPROVE_IP=y PROTOCOL_CHOICE=2 PORT_CHOICE=1 DNS=3 CLIENT=openvpn-client ./openvpn-install.sh
exit
exit
Copy the configuration file on your local machine and start the OpenVPN session (openvpn client must be installed)
gcloud compute scp ubuntu@$RAY_HEAD:/home/ubuntu/openvpn-client.ovpn ~/Downloads/
sudo openvpn --config ~/Downloads/openvpn-client.ovpn
Start a ray node on your local computer with 0 resources (or more, if you whish)
ray start --address='
Start ray API in python
import ray
ray.init(address='auto', redis_password='<REDIS_PASSWORD>')
ray.nodes() # To confirm that you are actually using the cluster
I was NOT able to connect to the dashboard. If anybody can fix this I'll appreciate it!
Enjoy
This issue was interesting, as I was trying to achieve something similar. I'd like to containerize ray in such a way that the actual cluster is independent from my web server. Now, I'm currently starting a head node in the same container of a flask web app using ray start --head --num-cpus=0
, in order to make sure that this node won't execute any actual work, but it feels like a workaround.
I guess there's no way, for now, to connect to a remote cluster and call remote functions like a task queue (e.g. celery style)?
@Vysybyl tried your solution, and it worked! To access the dashboard, you can either do a ssh port forwarding, ssh -L 8266:localhost:8266 <user>@<remote-head-ip>
, then you will be able to access dashboard http://localhost:8266
locally.
Or use --dashboard-host=0.0.0.0
on the head node when you start ray. Then you can access the dahsboard on http://<remote-head-openvpn-ip>:8266
.
I am also interested in using ray as a task queue to transparently offload heavy computations from local machines (e.g., laptops) to a server/cluster on the same network. I planned to do this as mentioned above, by connecting the laptops to the cluster by running ray start
with --num_cpus=0
to ensure that no processing runs at the laptops.
My main doubt is: could the laptops run ray on windows, and the remote server (i.e., the cluster head) run ray on linux? Or are both versions not compatible with each other? I could not find this in the documentation and cannot test it myself right now.
In case you want more datails on my use case:
I develop an app which currently uses ray internally for parallel computing (heavy computations on large numpy arrays) on the local computer. I chose ray so that it can also scale well to large workstations and clusters.
This app has a GUI which is used by users without much technical knowledge. I would like to have a ray cluster running on the same network, and allow the users running my app on their laptops to offload their computations to this cluster as easily as possible. Ideally ray
would also handle the available hardware resources when several users are submitting tasks at the same time (but this is not a must).
I planned to try something similar to what was proposed above. I would be interested on feedback to know wheather this is feasible and whether it is a good idea:
ray start --address=ip --redis-password=xxxx --num_cpus=0
on the background . This ensures that no processing runs on the laptops, and all is offloaded to the cluster.ray.init(address=ip, _redis_password=xxxx)
to connect to the ray cluster.Thanks!
I'm also interested by this feature, to run remote functions from my laptop to a HPC cluster. I want to do this iteractively from a jupyter notebook. I don't have root access on the cluster, VPN is not an option.
I've already set up such a solution with dask.distributed. Two tunnels (one for the central scheduler and one for the dashboard) suffice to make it work. This is convenient but I wanted to test the performance with Ray.
Hi guys,
In order to do what @Vysybyl proposed, the local machine has to have exact ray and python version as installed in remote cluster, otherwise it would throw version mismatch exception. Is there an easy way to make it such a way that my local machine can connect to arbitrary ray clusters (regardless of the ray and python minor versions)?
You can try Ray client, which is a bit more tolerant of Ray version skew: https://github.com/ray-project/ray/issues/13324
However, Python version still has to match due to serialization issues.
Oh that is perfect! Thanks for sharing the good news :)!
As Eric mentioned, this should be addressed with https://docs.ray.io/en/master/cluster/ray-client.html
To save anybody coming across this some time in setting up, I share my findings. Example with region us-west-2
, and port 6379
(default for the yaml examples):
ray up <your-ray-file>.yaml # extract public-ip and redis-password from output here
aws ec2 describe-instances --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output text --region us-west-2
aws ec2 authorize-security-group-ingress --group-id '<sg-xxxxxxxxxxx>' --protocol tcp --port 6379 --cidr 0.0.0.0/0 --region us-west-2
ray start --address='<public-ip>:6379' --redis-password='<redis-password>'
To do in 2022 what H2O could do in 2016:) And in R too...
What's the use case of this [remote API server connections] feature?
And to become compatible with the modern microservices architecture, where the client is running in a separate container from the server, with its own IP and its own app (e.g. Jupyter Notebook).
Describe your feature request
It would be great if we can connect to a remote cluster from a local machine directly using ray.init(). For example, say the IP of the ray head node is 52.13.54.127 for my AWS cluster, I want to be able to call ray.init(address="52.13.54.127") and run remote functions from my laptop.