run-house / runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.
https://run.house
Apache License 2.0
977 stars 37 forks source link

load cluster token from Den instead of generating client side #1228

Closed jlewitt1 closed 2 months ago

jlewitt1 commented 2 months ago
sentry-io[bot] commented 2 months ago

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: runhouse/resources/hardware/cluster.py

Function Unhandled Issue
restart_server ValueError: Error installing runhouse on cluster node <3.145.45.24> ...
Event Count: 1
restart_server ValueError: Error installing runhouse on cluster node <34.229.108.213> ...
Event Count: 1
restart_server ValueError: Failed to restart server llama-cluster main in <modul...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

jlewitt1 commented 2 months ago

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jlewitt1 and the rest of your teammates on Graphite Graphite