run-house / runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.
https://run.house
Apache License 2.0
977 stars 37 forks source link

add local telemetry agent for clusters #1210

Open jlewitt1 opened 2 months ago

jlewitt1 commented 2 months ago

Install a local telemetry agent receiver process on clusters, which is responsible for exporting various telemetry data (ex: spans, metrics) to the Runhouse backend collector

jlewitt1 commented 2 months ago

This stack of pull requests is managed by Graphite. Learn more about stacking.