ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.39k stars 5.66k forks source link

[core] Get IP Address of Actor #7431

Open richardliaw opened 4 years ago

richardliaw commented 4 years ago

Describe your feature request

I want to be able to map Actors to the Nodes that they are running on. I know the IP address of each node (via ray.state.nodes()). However, there's no API to get the IP address of an actor.

The current workaround is to ping a common IP (google IP), which is what is currently implemented in ray.services.get_node_ip_address(). This will always return a public IP, but sometimes I want the private IP (the one provided to ray start).

richardliaw commented 4 years ago

Suggested APIs:

ActorHandle.get_data() -> has IP Address
ray.node_params() -> all node params passed to ray start on this node

cc @edoakes

jovany-wang commented 4 years ago

I think it's too verbose to add ip address into actor handle.

In Java, there's a getAllNodeInfo() API that can return the address of a node in Java. https://github.com/ray-project/ray/blob/master/java/api/src/main/java/org/ray/api/runtimecontext/RuntimeContext.java#L49

Also, we have implemented a HTTP API get_job_actors() that returns all actor info with its nodeaddress. This is based on JobManager, and we can integrate this to dashboard as well.

richardliaw commented 4 years ago

Hm, why is it too verbose?

Also, I need to use this in the application (to detect location of actors) rather than just via the dashboard.

yzs981130 commented 2 years ago

ActorHandle.get_data() -> has IP Address

It seems that the get_data has been removed?

I got AttributeError: 'ActorHandle' object has no attribute 'get_data' on current master (ray, version 3.0.0.dev0).

scottsun94 commented 1 year ago

@rickyyx @rkooo567 It seems that ip is still not available when using ray list actors: https://docs.ray.io/en/latest/ray-observability/state/ray-state-api-reference.html#actorstate?

rkooo567 commented 1 year ago

@scottsun94 it's on the list. Btw what's the use case of this feature? Are you using this data for scheduling, or is it purely observability purpose?

scottsun94 commented 1 year ago

@richardliaw explains the use case at the top. Is there more info/context?

rkooo567 commented 1 year ago

@richardliaw explains the use case at the top. Is there more info/context?

The description doesn't answer my question I think. It just says what he wants, but not the purpose!

yzs981130 commented 1 year ago

@scottsun94 it's on the list. Btw what's the use case of this feature? Are you using this data for scheduling, or is it purely observability purpose?

I also believe this feature will be useful for the given two use cases.

For actor coordination and inner service provided by actors. Taking distributed deep learning and PyTorch (w/o Ray train) as an example, we could take one worker as an individual actor in Ray. PyTorch distributed needs a master address and port for worker coordination, which could not be determined before actors get scheduled. Long-running service like KV storage also requires the node IP of an actor to provide internal services.
For observability. When monitoring actors and their correlating nodes, currently w/o the ActorHandle.get_data() which I mentioned above, we could only parse the /nodes?view=details. I suppose there is no other simple way to retrieve the node information of a running actor.

cc @rkooo567

richardliaw commented 1 year ago

+1 mostly for coordination, maybe due to external systems. Also, just look at where ip address is determined in all the Ray libraries. some form of socket.gethostbyname is very frequently used.

rickyyx commented 1 year ago

@yzs981130 Thanks for the elaboration. This is great to know.

So with ray 2.2, you should be able to get the node information of individual actor from state API, something like ray.experimental.state.api.get_actor(<actor_id>) should include the node information.

rkooo567 commented 1 year ago

Btw, I think adding get_data is not a too bad idea. It's easier to find the API, and we'd like to avoid using the state API for things other than the observability purpose, and it may not available from the default ray (you can use it only when ray[default] is used). I believe all information can easily be obtained already. cc @scv119 any thought?