run-house / runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.
https://run.house
Apache License 2.0
977 stars 37 forks source link

add num ports to try for ssh tunnel as constant #1229

Closed jlewitt1 closed 2 months ago

sentry-io[bot] commented 2 months ago

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: runhouse/resources/hardware/cluster.py

Function Unhandled Issue
connect_tunnel Exception: Failed to create find open port after -1 attempts runhouse.resources.hardware.sky_ssh_runner i...
Event Count: 6

Did you find this useful? React with a 👍 or 👎

jlewitt1 commented 2 months ago

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jlewitt1 and the rest of your teammates on Graphite Graphite

dongreenberg commented 2 months ago

I actually don't want this to be a cluster setting, it's not a property of the cluster itself and we mostly don't need to add another setting to the cluster for it. I think we can leave everything as is but just make NUM_PORTS_TO_TRY a global instead of hardcoded as 10, so if a user needs to change it they can just do runhouse.constants.NUM_PORTS_TO_TRY = 10 as a stopgap. We can figure out how to make it a proper setting in the future but I don't think it's really a property of the cluster itself.