skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.67k stars 493 forks source link

Use environment variable to define file mount #2093

Closed QLutz closed 1 year ago

QLutz commented 1 year ago

Hi and thanks for this great project !

I am in a situation where I would like to mount some directory in a bucket which changes based on the value of an environment variable. I am unsure how to do this with SkyPilot as defining a file mount would typically require that I specify the corresponding URL in my task YAML file beforehand.

Put simply I'd like to do

file_mounts:
  /some/folder:
    source: $URL_IN_MY_ENVVAR
    mode: MOUNT

And be able to set URL_IN_MY_ENVVAR at runtime. Am I missing something or would it be a feature that may be of interest?

romilbhardwaj commented 1 year ago

Thanks for filing this report @QLutz and welcome to SkyPilot!

One solution to this could be to use envsubst to create a temporary yaml file that you can launch. Would that work?

export URL_IN_MY_ENVVAR=s3://mybucket/
envsubst < template.yaml > run.yaml       
sky launch run.yaml

Alternatively, you can also use the SkyPilot python API if you find that easier to integrate into your workflow.

We would love to hear a bit more about your use case for this to see if we can add this as a feature. Can you share a little more about your workload and how you are using SkyPilot?

QLutz commented 1 year ago

Thanks for your answer @romilbhardwaj

My current workflow consists in setting some parameters for my deployment at launch, e.g.:

sky launch -c my_deployment --env ENV1=val1 --env ENV2=val2 --gpus=T4:2 my_task.yaml

with all parameters having default values (that I may or may not want to override) in my_task.yaml which would resemble:

resources:
  accelerators: T4:1
  disk_size: 250

run: |
  ENV1=${ENV1:-default_val1} \
  ENV2=${ENV2:-default_val2} \
  ENV3=${ENV3:-default_val3} \
  sh ./scripts/run.sh

I find this to be very convenient as it allows for seamless customization of my deployment. Mounts however, cannot be set as easily in the CLI. I understand that many solutions exist (and the one you offer is probably the simplest one at the moment) but I find that the current simplicity of Task YAMLs is very elegant and allows for highly maintainable workflows (using the Python API while more versatile is quite a bit more cumbersome). It's definitely a convenience thing and I get that you may have more pressing issues to handle. Nevertheless, if you find that allowing for envvars to be used in the file mounts is not that hard, I think it has some value.

concretevitamin commented 1 year ago

Thanks @QLutz! This certainly makes sense. We do want to get to it; before we have bandwidth any contribution from you guys is definitely very welcome.

concretevitamin commented 1 year ago

@QLutz We just merged this support. Here's an example of how you can use env vars: https://github.com/skypilot-org/skypilot/blob/master/examples/using_file_mounts_with_env_vars.yaml

QLutz commented 1 year ago

Thanks a lot !