skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

Implement `with_data` API for Edge-Based Data Flow in Task DAGs #4254

Closed andylizf closed 6 days ago

andylizf commented 2 weeks ago
  1. Implement with_data API

    • Add with_data(path, size_gb=...) method on task edges to specify unique data paths for each downstream task.
    • Example usage:
      (preprocess >> train_a).with_data('/data/model_a', size_gb=2.0)
      (preprocess >> train_b).with_data('/data/model_b', size_gb=2.0)
  2. Update YAML Configuration

    • Modify YAML format to support edge-based path definitions between tasks.
  3. Add Documentation and Examples

    • Document usage of with_data in both code and YAML, with example configurations.