ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.04k stars 5.78k forks source link

[runtime] worker_path don't work in container runtime env #24924

Open SongGuyang opened 2 years ago

SongGuyang commented 2 years ago

What happened + What you expected to happen

The worker_path param don't work now. I will fix this.

Versions / Dependencies

1.12.0

Reproduction script

 runtime_env={
        "container": {
            "image": "localhost/raytest/container:v1",
            "worker_path": "/home/ray/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py",
        }
    }

Issue Severity

No response

stale[bot] commented 2 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 2 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

0x2b3bfa0 commented 1 year ago

Affected by this issue when trying to use rayproject/ray-ml:2.0.0-gpu; the worker_path option is being ignored.[^1]

(raylet) python: can't open file '/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/ray/_private/workers/default_worker.py': [Errno 2] No such file or directory
(raylet) [2023-01-17 18:58:53,372 E 3012 3012] (raylet) worker_pool.cc:526: Some workers of the worker process(13876) have not registered within the timeout. The process is dead, probably it crashed during start.

[^1]: I tried using the value /root/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py (valid inside the container)

psydok commented 11 months ago

Same problem with ray==2.8.0, but manually found the module /usr/local/lib/python3.10/site-packages/ray/_private/workers/default_worker.py everywhere.