run-house / runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.
https://run.house
Apache License 2.0
965 stars 37 forks source link

remove SageMaker cluster #1222

Closed jlewitt1 closed 1 month ago

jlewitt1 commented 1 month ago

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jlewitt1 and the rest of your teammates on Graphite Graphite

sentry-io[bot] commented 1 month ago

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: runhouse/resources/hardware/cluster.py

Function Unhandled Issue
connect_server_client Exception: Failed to create find open port after -1 attempts runhouse.resources.hardware.sky_ssh_runner i...
Event Count: 6
📄 File: runhouse/resources/hardware/cluster_factory.py (Click to Expand) | Function | Unhandled Issue | | :------- | :----- | | **`cluster`** | [**ValueError: Resource cpu-cluster not found.**](https://runhouse.sentry.io/issues/5764451933/?referrer=github-open-pr-bot) airfl...
`Event Count:` **2** | | **`ondemand_cluster`** | [**ValueError: Sky's cluster status does not have the necessary information to connect to the cluster. Please ch...**](https://runhouse.sentry.io/issues/5693196273/?referrer=github-open-pr-bot) ...
`Event Count:` **1** | | **`cluster`** | [**ValueError: Resource cpu-cluster not found.**](https://runhouse.sentry.io/issues/5764485419/?referrer=github-open-pr-bot) airfl...
`Event Count:` **1** |

Did you find this useful? React with a 👍 or 👎