Closed 1tac11 closed 1 year ago
That is very strange. We make the lock file directory in line 35. Then when creating the lock file in line 36, we get an error about the file not existing.
Here's the relevant code: https://github.com/ray-project/ray/blob/467b248c4fff885b478a87da8d64c19c2c363049/python/ray/autoscaler/_private/local/node_provider.py#L35-L36
I haven't seen this happen when I've used local node provider, so I suspect we may have trouble reproducing the issue.
@1seck I would recommend looking into debugging this yourself -- try dropping a couple of breakpoints in the relevant Ray and FileLock code and see what's up.
re: cluster launcher testing, cc @scv119, we should try to cover LocalNodeProvider (the on-prem node provider) eventually.
Were just some wrong parameters, like commenting out instead of empty array etc. or setting the same ip-address for head and worker, I don't quite recall, but could be maybe documented better, error description wise. I close for now.
What happened + What you expected to happen
crash with error:
Versions / Dependencies
ray 1.13.0 python 3.8.13 conda 22.9.0 pip 22.1.2 Ubuntu 18.04
Reproduction script
I want to connect with ray to lambdalabs.com started instance take vanilla
example-full.yaml
comment out docker ssh_user ubuntu fill in head ip take same ip for workers since only one ip is set so farIssue Severity
Low: It annoys or frustrates me.