SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
This PR introduces edge-based data flow support in DAGs, allowing users to specify data transfer paths and sizes between tasks. This provides a more explicit and flexible way to define data dependencies between tasks.
Changes
Added TaskEdge dataclass to represent edges between tasks:
Stores source task, target task, data path, and data size
Provides with_data() method for fluent API
Enhanced DAG implementation:
Store edge metadata in networkx graph
Added methods to get/manipulate edges and their properties
Updated YAML format:
Replaced downstream with new edges field for explicit edge definition
Added support for data transfer specifications on edges
Closes #4254
Description
This PR introduces edge-based data flow support in DAGs, allowing users to specify data transfer paths and sizes between tasks. This provides a more explicit and flexible way to define data dependencies between tasks.
Changes
Added
TaskEdge
dataclass to represent edges between tasks:with_data()
method for fluent APIEnhanced DAG implementation:
Updated YAML format:
downstream
with newedges
field for explicit edge definitionExample Usage
Python API:
YAML format:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh