Different conda environments - Githubissues

stfc-sciml / sciml-bench

SciML Benchmarking Suite for AI for Science

MIT License

38 stars 13 forks source link

Different conda environments #9

Open juripapay opened 1 year ago

juripapay commented 1 year ago

We need to think how the framework can install isolated conda environments which are application specific. This issue case up with the Hydronet benchmark which is very sensitive to library versions. If we don't install specific versions of libraries it will not work. The problem is that these dependencies might be in conflict with the previously installed libraries and it would be better to create a specific environment just for running Hydronet.

The Hydronet dependencies can be installed by the following commands:

1) Create conda environment conda create --name hydronet2 python=3.8

2) Activate conda environment activate conda hydronet2

3) Installing pytorch: conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch -c conda-forge

4) conda install pyg -c pyg

5) conda install -c conda-forge tensorboard ase fair-research-login h5py tqdm

6) conda install -c conda-forge gdown

7) pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu113.html

samueljackson92 commented 1 year ago

Yes I agree this is an issue. As discussed I think there are two options:

Add conda as a requirement and then use conda to create a different environment for each benchmark.
Do it with containers (e.g. singularity) and each benchmark gets a container. However this can be headache when it comes to multi-gpu, multi-node runs.

samueljackson92 commented 1 year ago

Spotted an issue with this on stemdl as well. I think the latest version of pytorch was causing an issue and dropping back to <2.0 fixed it.

I did some investigation with micro-mamba yesterday evening and I think we could install our own version in sciml-bench folder, then create a new env on install for each. The tricky part is not letting conda take over the users bash env, as they might have their own conda install/environments

samueljackson92 commented 1 year ago

Keeping track of benchmark requirements: here is a script for correctly installing tensorflow & requirements for the mnist benchmark. Other tensorflow benchmarks (optics, cloud, etc.) will be similar and will mostly differ in the last line.

#!/bin/bash
set -x

# Create new environment
ENV_NAME=sciml-bench-mnist_tf_keras
conda remove -n $ENV_NAME --all -y --quiet
conda create -n $ENV_NAME python=3.9 -y --quiet
ENV_PATH=$(dirname $(dirname /home/lhs18285/miniconda3/bin/conda))/envs/$ENV_NAME

# Install conda requirements
conda install -n $ENV_NAME -c conda-forge cudatoolkit=11.2.2 cudnn=8.1.0 -y --quiet
conda install -n $ENV_NAME -c nvidia cuda-nvcc=11.3.58 -y --quiet

# Configure environment variables
mkdir -p $ENV_PATH/etc/conda/activate.d
echo "export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${ENV_PATH}/lib/" >> $ENV_PATH/etc/conda/activate.d/env_vars.sh
echo "export XLA_FLAGS='--xla_gpu_cuda_data_dir=${ENV_PATH}/lib'" >> $ENV_PATH/etc/conda/activate.d/env_vars.sh

# Work around for Ubuntu 22.04. See: https://www.tensorflow.org/install/pip
mkdir -p $ENV_PATH/lib/nvvm/libdevice
cp $ENV_PATH/lib/libdevice.10.bc $ENV_PATH/lib/nvvm/libdevice/

# Install pip requirements
conda run -n $ENV_NAME LD_LIBRARY_PATH=$ENV_PATH/lib/ python -m pip install --upgrade pip -q
conda run -n $ENV_NAME LD_LIBRARY_PATH=$ENV_PATH/lib/ python -m pip install . -q
conda run -n $ENV_NAME LD_LIBRARY_PATH=$ENV_PATH/lib/ python -m pip install "tensorflow==2.11.*" scikit-image -q

samueljackson92 commented 1 year ago

And here's the script for stemdl (and pytorch). It also includes fixing to the correct pytoch_lightning version.

#!/bin/bash
set -e
set -x

# Create new environment
ENV_NAME=sciml-bench-stemdl_classification
conda remove -n $ENV_NAME --all -y --quiet
conda create -n $ENV_NAME python=3.9 -y --quiet
ENV_PATH=$(dirname $(dirname /home/lhs18285/miniconda3/bin/conda))/envs/$ENV_NAME

# Install conda requirements
conda install -n $ENV_NAME -y --quiet pytorch==1.13.1 torchvision==0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia

# Install pip requirements
conda run -n $ENV_NAME python -m pip install -q --upgrade pip
conda run -n $ENV_NAME python -m pip install -q "pytorch_lightning==1.9.*" scikit-learn tensorboard
conda run -n $ENV_NAME python -m pip install -q .

samueljackson92 commented 1 year ago

I started capturing install scripts for each environment in: dev/install_scripts/*.sh. They are quite useful for testing & will be a useful documentation of the dependencies for whatever refactoring solution we design in future,