skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

[DAG] Add Edge-Based Data Flow Support #4280

Closed andylizf closed 2 weeks ago

andylizf commented 2 weeks ago

Closes #4254

Description

This PR introduces edge-based data flow support in DAGs, allowing users to specify data transfer paths and sizes between tasks. This provides a more explicit and flexible way to define data dependencies between tasks.

Changes

  1. Added TaskEdge dataclass to represent edges between tasks:

    • Stores source task, target task, data path, and data size
    • Provides with_data() method for fluent API
  2. Enhanced DAG implementation:

    • Store edge metadata in networkx graph
    • Added methods to get/manipulate edges and their properties
  3. Updated YAML format:

    • Replaced downstream with new edges field for explicit edge definition
    • Added support for data transfer specifications on edges

Example Usage

Python API:

with sky.Dag() as dag:
    preprocess = sky.Task(name='preprocess', run='python preprocess.py')
    train_a = sky.Task(name='train_a', run='python train.py')
    train_b = sky.Task(name='train_b', run='python train.py')

    (preprocess >> train_a).with_data('/data/model_a', '/train', size_gb=2.0)
    (preprocess >> train_b).with_data('/data/model_b', '/train', size_gb=2.0)

YAML format:

name: example-pipeline
edges:
  - source: preprocess
    target: train_a
    data:
      source_path: /data/raw/model_a
      target_path: /data/processed/model_a
      size_gb: 2.0
  - source: preprocess 
    target: train_b
    data:
      source_path: /data/raw/model_b
      target_path: /data/processed/model_b
      size_gb: 2.0

---
name: preprocess
run: python preprocess.py
---
name: train_a
run: python train.py
---
name: train_b
run: python train.py

Tested (run the relevant ones):

andylizf commented 2 weeks ago

@cblmemo PTAL, thanks!

andylizf commented 2 weeks ago

Thanks! I don’t have merge permissions for this repo.🥲 Could you handle it for me?