Closed robertnishihara closed 5 years ago
I posted this in case anyone runs into the same error. Closing because the workaround is stated above.
For the latest version of ray you need to pass the temp_dir argument like the following:
ray.init(_temp_dir='/tmp/something_else')
I get the same error when using
ray.init(_temp_dir='./tmp/ray')
more specifically this was executed on host node,
ray start --head --port=6379
Then this on worker node
ray.init(address='128.55.68.181:6379', _redis_password='5241590000000000',_temp_dir='~tmp/ray')
Same error when using CLI instead of Python
ray start --address='128.55.68.181:6379' --redis-password='5241590000000000' --temp-dir='~tmp/ray
stack trace below
jwillard@perlmutter:nid001629:~> ray start --address='128.55.68.181:6379' --redis-password='5241590000000000' --temp-dir='~tmp/ray'
Local node IP: 128.55.69.198
Traceback (most recent call last):
File "/global/homes/j/jwillard/miniconda3/envs/stml/bin/ray", line 8, in
ray.init(address="auto", _temp_dir='/tmp/s1')
also error:
$ /opt/python38/bin/python3 ray-demo1.py
Traceback (most recent call last):
File "ray-demo1.py", line 12, in
This issue should be reopened, I am also getting this issue with ray 1.13.0.
The param _temp_dir
is ignored when calling ray.init()
from an Amazon EC2 machine.
Setup:
ray start --head --port 3306
ray.init(address='auto', ignore_reinit_error=True, _temp_dir='/home/adivekar/tmp/')
Error: PermissionError: [Errno 13] Permission denied: '/tmp/ray/session_2022-08-20_09-00-51_347113_37451/ports_by_node.json.lock'
Clearly this is ignoring _temp_dir
I can get around this by using chmod 777 -R /tmp/ray/*
but I feel like I really should not have to.
Pinging @robertnishihara , hopefully this issue can be reopened since users are still experiencing it?
I can get around this by using
chmod 777 -R /tmp/ray/*
but I feel like I really should not have to.
I got the same issue. The same script is going to raise permission error saying the files under ray
folder does not have permission, while other deployed scipts work just fine.
@ARDivekar can you try with Ray 2.0? I just tried locally (on a Macbook) but wasn't able to reproduce the issue.
@ARDivekar can you try with Ray 2.0? I just tried locally (on a Macbook) but wasn't able to reproduce the issue.
I got the same issue with Ray 2.0.0
Code ray.init(address='auto')
Error
File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.
File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/worker.py:1475, in init(address, num_cpus, num_gpus, resources, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs) 1462 ray_params = ray._private.parameter.RayParams( 1463 node_ip_address=node_ip_address, 1464 raylet_ip_address=raylet_ip_address, (...) 1472 metrics_export_port=_metrics_export_port, 1473 ) 1474 try: -> 1475 _global_node = ray._private.node.Node( 1476 ray_params, 1477 head=False, 1478 shutdown_at_exit=False, 1479 spawn_reaper=False, 1480 connect_only=True, 1481 ) 1482 except ConnectionError: 1483 if gcs_address == ray._private.utils.read_ray_address(_temp_dir):
File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/node.py:244, in Node.init(self, ray_params, head, shutdown_at_exit, spawn_reaper, connect_only) 237 self._plasma_store_socket_name = self._prepare_socket_file( 238 self._ray_params.plasma_store_socket_name, default_prefix="plasma_store" 239 ) 240 self._raylet_socket_name = self._prepare_socket_file( 241 self._ray_params.raylet_socket_name, default_prefix="raylet" 242 ) --> 244 self.metrics_agent_port = self._get_cached_port( 245 "metrics_agent_port", default_port=ray_params.metrics_agent_port 246 ) 247 self._metrics_export_port = self._get_cached_port( 248 "metrics_export_port", default_port=ray_params.metrics_export_port 249 ) 251 ray_params.update_if_absent( 252 metrics_agent_port=self.metrics_agent_port, 253 metrics_export_port=self._metrics_export_port, 254 )
File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/node.py:801, in Node._get_cached_port(self, port_name, default_port) 798 # Maps a Node.unique_id to a dict that maps port names to port numbers. 799 ports_by_node: Dict[str, Dict[str, int]] = defaultdict(dict) --> 801 with FileLock(file_path + ".lock"): 802 if not os.path.exists(file_path): 803 with open(file_path, "w") as f:
File /opt/tljh/user/lib/python3.9/site-packages/filelock/_api.py:220, in BaseFileLock.enter(self) 214 def enter(self) -> BaseFileLock: 215 """ 216 Acquire the lock. 217 218 :return: the lock object 219 """ --> 220 self.acquire() 221 return self
File /opt/tljh/user/lib/python3.9/site-packages/filelock/_api.py:173, in BaseFileLock.acquire(self, timeout, poll_interval, poll_intervall, blocking) 171 if not self.is_locked: 172 _LOGGER.debug("Attempting to acquire lock %s on %s", lock_id, lock_filename) --> 173 self._acquire() 175 if self.is_locked: 176 _LOGGER.debug("Lock %s acquired on %s", lock_id, lock_filename)
File /opt/tljh/user/lib/python3.9/site-packages/filelock/_unix.py:35, in UnixFileLock._acquire(self) 33 def _acquire(self) -> None: 34 open_mode = os.O_RDWR | os.O_CREAT | os.O_TRUNC ---> 35 fd = os.open(self._lock_file, open_mode) 36 try: 37 fcntl.flock(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
PermissionError: [Errno 13] Permission denied: '/tmp/ray/session_2022-10-17_23-56-04_168517_260622/ports_by_node.json.lock'
@robertnishihara hi~ Is there any way to specify tmp_dir
for worker node by CLI? The doc says:
--temp-dir
manually specify the root temporary dir of the Ray process, only works when –head is specified
I already specify it for the head node however the worker node will omit it.
ray start --head --node-ip-address="$head_node_ip" \
--port=$port \
--include-dashboard=true --dashboard-port=$dashboard_port --dashboard-host=0.0.0.0 \
--ray-client-server-port=$client_server_port \
--temp-dir="/tmp/ray_tmp"
ray start --address "$ip_head" \
--temp-dir="/tmp/ray_tmp" # OMITTED!!! I got `PermissionError: [Errno 13] Permission denied: '/tmp/ray/ray_current_cluster'`
By default, the permission of ray_current_cluster
is private to the creator.
Someone ran into the following error today when calling
ray.init()
.The issue was that Ray is attempting to log to
/tmp/ray
and the user was on a shared machine in which a different user owned/tmp/ray
. the solution was to callto force ray to log somewhere else (to a directory that the user could create).