skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.8k stars 510 forks source link

Python version mismatch when migrating from AWS to GCP #1604

Closed pschafhalter closed 1 year ago

pschafhalter commented 1 year ago

Description of Problem

I recently migrated my workload from AWS to GCP and ran into issues with python versions. AWS's deep learning AMI is based on LTS versions of Ubuntu with Python 3.8+, whereas the default SkyPilot GCP image uses Debian 10 with Python 3.7. This resulted in an error as some libraries in my application relied on features from Python 3.8+.

Note that the default SkyPilot image for AWS is based on Ubuntu 22.04 and uses Python 3.9.

I believe this is a SkyPilot issue because running workloads on any cloud requires a consistent environment. This can be addressed by launching SkyPilot tasks in images that are uniform across clouds with tools such as docker or the cloud's image marketplace.

Current workaround

I've currently made some changes to my YAML files to set up and activate conda environments for the specific Python version I need:

setup: |
  # Create an environment with Python 3.8.
  conda create -y -n my_task python=3.8
  source activate my_task
  # Install required Python packages.

run: |
  source activate experiments
  # Run the workload script.

Note that this workaround only addresses Python version mismatches. Version mismatches in other libraries and programs (e.g. gcc) will require a different solution.

Posted at the request of @romilbhardwaj.

lhqing commented 1 year ago

I have similar problems with the python versions, so I am following up here. Several tools I use have bumped their minimum python version to py3.8, while the GCP sky-VM default python is still py3.7. I can do a similar workaround like @pschafhalter wrote above.

I wonder the SkyPilot has some built-in solutions for this, or would it be the user's job to handle python versions?

Besides, another very common change I have to do to the default VM is that I use mamba instead of conda to install packages (especially many bioinfo-specific ones). conda takes a ridiculously long time to resolve dependencies (hours), which makes it impossible for me to use. I wonder could the SkyPilot consider adding mamba as an option?

My current workaround code in the setup section looks like this: I first download the latest mambaforge (like a miniconda install), I then add mamba to the $PATH, then I need to install skypilot again otherwise there will be an error saying sky not found (sky was installed in the default conda env). I can then install my heavy duty packages and everything seems works fine with me so far.

setup: |
  if [ -f $HOME/mambaforge/bin/mamba ]
  then
    echo "Skip setup"
  else
    wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh -O $HOME/Mambaforge-Linux-x86_64.sh
    bash $HOME/Mambaforge-Linux-x86_64.sh -b -p $HOME/mambaforge
    echo 'export PATH=$HOME/mambaforge/bin:$PATH' >> $HOME/.bashrc
    source $HOME/.bashrc

    pip install --upgrade pip
    pip install "skypilot[gcp]"

    gsutil cp gs://ecker-hanqing-wmb-us-west1/wmb/sky/env.yaml ./
    mamba env update -f env.yaml
  fi
Michaelvll commented 1 year ago

Another user asks for the ubuntu os system. We need to prioritize this a bit.

concretevitamin commented 1 year ago

Ran into this too. HF Transformers required >= 3.8 (ref). Had to create a conda env manually rather than using sys python (3.7.x) when on the GCP image.