skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 512 forks source link

[Core] Environment variables should be parsed at task execution, not `sky.Task` instantiation #4363

Open romilbhardwaj opened 1 week ago

romilbhardwaj commented 1 week ago

I have a YAML like this:

envs:
  DATA_BUCKET_NAME: sky-demo-data-test
  DATA_BUCKET_STORE_TYPE: s3

file_mounts:
  /data:
    name: $DATA_BUCKET_NAME
    store: $DATA_BUCKET_STORE_TYPE

And a task like so:

bucket_name = 'mybucket'
bucket_store_type = 's3'
task = sky.Task.from_yaml(yaml_path)
task.update_envs({"DATA_BUCKET_NAME": bucket_name, "DATA_BUCKET_STORE_TYPE": bucket_store_type})
sky.launch(task, down=True)

In this case, SkyPilot sets the bucket name to sky-demo-data-test, instead of overriding it mybucket with from my envs that I updated in my python script.

This behavior is because we fill in from env vars in storage and file_mounts in from_yaml(). We should probably do it at runtime instead.