ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
31.93k stars 5.44k forks source link

[core][experimental] Timeout on hanging DAG #46155

Closed stephanie-wang closed 21 hours ago

stephanie-wang commented 1 week ago

Description

A DAG can hang from application code and/or system-level bugs. Add a timeout for ongoing executions to be safe.

Use case

No response

anyscalesam commented 5 days ago

Tests running right now > waiting on that to then merge.