ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.94k stars 5.77k forks source link

[core] Node IDs not consistent across APIs #25090

Open edoakes opened 2 years ago

edoakes commented 2 years ago

ray.get_runtime_context(), ray.state.node_ids(), and ray.nodes() all return different types for node ID. We should standardize these and make them consistent. Ideally it seems this would return a NodeID type across all of the APIs as get_runtime_context() does.

In [10]: ray.get_runtime_context().node_id
Out[10]: NodeID(63adf211f766a26da75f6a38fb6d2ffa638c9eebb8d0b8eb3e121c27)

In [11]: ray.state.node_ids()
Out[11]: ['node:127.0.0.1']

In [12]: ray.nodes()
Out[12]:
[{'NodeID': '63adf211f766a26da75f6a38fb6d2ffa638c9eebb8d0b8eb3e121c27',
  'Alive': True,
  'NodeManagerAddress': '127.0.0.1',
  'NodeManagerHostname': 'Edwards-MacBook-Pro-2.local',
  'NodeManagerPort': 53096,
  'ObjectManagerPort': 53095,
  'ObjectStoreSocketName': '/tmp/ray/session_2022-05-23_11-43-44_890539_92065/sockets/plasma_store',
  'RayletSocketName': '/tmp/ray/session_2022-05-23_11-43-44_890539_92065/sockets/raylet',
  'MetricsExportPort': 60919,
  'NodeName': '127.0.0.1',
  'alive': True,
  'Resources': {'node:127.0.0.1': 1.0,
   'memory': 48312048026.0,
   'CPU': 10.0,
   'object_store_memory': 2147483648.0}}]
edoakes commented 2 years ago

cc @ericl @pcmoritz for API issues

edoakes commented 2 years ago

@jjyao I think we should standardize this for Ray 2.0. In the meantime, I'm trying to accomplish something quite simple: I just want to get the NodeID for all of the nodes currently connected to the cluster so I can pass them into a SchedulingPolicy. Could you please advise me on the best way to do this? It seems that ray.nodes() should work assuming I can pass the string version of the NodeID into the scheduling API.

jjyao commented 2 years ago

@edoakes We discussed this and we decided to use hex string as the standard node id. This work is planned for 2.0.

Currently NodeAffinitySchedulingStrategy accepts both hex string and NodeID so either will work. But after 2.0, it will only accepts hex string.

edoakes commented 2 years ago

Ok, sounds good, per my question above should I be used ray.nodes to get all NodeIDs in the cluster?

jjyao commented 2 years ago

Ok, sounds good, per my question above should I be used ray.nodes to get all NodeIDs in the cluster?

Yes. ray.nodes() to get all nodes.

edoakes commented 2 years ago

We should also audit this API for Ray 2.0... for example it seems to have both alive: True and Alive: True above 😅

jjyao commented 2 years ago

Yea, it's currently marked as DeveloperAPI. Will this API (getting all the nodes of the cluster) be covered by observability work? @rkooo567