[X] I searched the issues and found no similar issues.
Ray Component
Ray Clusters
Issue Severity
Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.
What happened + What you expected to happen
ray up failed when launching a head node on GCP during file_mounts processing.
This bug was discussed in https://github.com/ray-project/ray/issues/16539 but seems it hasn't been fixed yet.
This error can be reproduced when file_mounts contains a large number of files (such as a Ray repository).
Possible reason might be the instability of SSH connection to VM during the first few minutes after it's launched.
As Ray Autoscaler does not handle such SSH connection failure, the whole process would fail. I wonder if it's possible for Ray to support such error handling?
Search before asking
Ray Component
Ray Clusters
Issue Severity
Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.
What happened + What you expected to happen
ray up
failed when launching a head node on GCP during file_mounts processing. This bug was discussed in https://github.com/ray-project/ray/issues/16539 but seems it hasn't been fixed yet.This error can be reproduced when file_mounts contains a large number of files (such as a Ray repository). Possible reason might be the instability of SSH connection to VM during the first few minutes after it's launched. As Ray Autoscaler does not handle such SSH connection failure, the whole process would fail. I wonder if it's possible for Ray to support such error handling?
Versions / Dependencies
ray==1.9.2 python==3.8.11
on ubuntu-18.04.Reproduction script
My YAML file used for
ray up
.Anything else
The problem happens very frequently to me.
Are you willing to submit a PR?