njkrichardson / cdvae

Fork of crystal diffusion variational autoencoders, with added build tooling for repeating/reproducing their results.
MIT License
5 stars 0 forks source link

Makefile build Error #1

Open Kiencan opened 9 months ago

Kiencan commented 9 months ago

When running the command "make", I encounter the following error: ERROR: failed to solve: process "/bin/sh -c conda init zsh \ && . ~/.zshrc \ && conda create --name cdvae python=3.9 \ && conda activate cdvae \ && conda install -y pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia \ && conda install -c conda-forge pytorch-lightning \ && conda install -c conda-forge ase autopep8 seaborn tqdm nglview \ && pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu116.html \ && pip install higher hydra-core==1.1.0 hydra-joblib-launcher==1.1.5 p-tqdm==1.3.3 pytest python-dotenv smact==2.2.1 streamlit==0.79.0 torchdiffeq wandb \ && pip install matminer==0.7.3 \ && pip install "protobuf==3.20.*"" did not complete successfully: exit code: 1 make: *** [Makefile:2: build] Error 1 Can you help me to solve this problem?

njkrichardson commented 9 months ago

I'd need to see the full error, can you post it?

Kiencan commented 9 months ago

I'd need to see the full error, can you post it?

docker build --tag cdvae:latest . [+] Building 80.1s (10/13) docker:default [+] Building 1772.3s (11/13) docker:default => [internal] load build definition from Dockerfile 0.1s => => transferring dockerfile: 2.49kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => resolve image config for docker.io/docker/dockerfile:1 7.9s => [auth] docker/dockerfile:pull token for registry-1.docker.io 0.0s => CACHED docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50 0.0s => [internal] load metadata for docker.io/nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04 2.1s => [auth] nvidia/cuda:pull token for registry-1.docker.io 0.0s => [stage-0 1/6] FROM docker.io/nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04@sha256:17a3a7eeafcef99f327949782f9ec 0.0s => CACHED [stage-0 2/6] RUN --mount=type=cache,target=/var/cache/apt apt-get update && DEBIAN_FRONTEND=nonin 0.0s => CACHED [stage-0 3/6] RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh & 0.0s => ERROR [stage-0 4/6] RUN conda init zsh && . ~/.zshrc && conda create --name cdvae python=3.9 & 1760.8s

[stage-0 4/6] RUN conda init zsh && . ~/.zshrc && conda create --name cdvae python=3.9 && conda activate cdvae && conda install -y pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia && conda install -c conda-forge pytorch-lightning && conda install -c conda-forge ase autopep8 seaborn tqdm nglview && pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu116.html && pip install higher hydra-core==1.1.0 hydra-joblib-launcher==1.1.5 p-tqdm==1.3.3 pytest python-dotenv smact==2.2.1 streamlit==0.79.0 torchdiffeq wandb && pip install matminer==0.7.3 && pip install "protobuf==3.20.*": 3.466 no change /root/miniconda3/condabin/conda 3.466 no change /root/miniconda3/bin/conda 3.466 no change /root/miniconda3/bin/conda-env 3.466 no change /root/miniconda3/bin/activate 3.466 no change /root/miniconda3/bin/deactivate 3.466 no change /root/miniconda3/etc/profile.d/conda.sh 3.466 no change /root/miniconda3/etc/fish/conf.d/conda.fish 3.466 no change /root/miniconda3/shell/condabin/Conda.psm1 3.466 no change /root/miniconda3/shell/condabin/conda-hook.ps1 3.466 no change /root/miniconda3/lib/python3.11/site-packages/xontrib/conda.xsh 3.466 no change /root/miniconda3/etc/profile.d/conda.csh 3.466 modified /root/.zshrc 3.466 3.466 ==> For changes to take effect, close and re-open your current shell. <== 3.466 8.304 Channels: 8.304 - defaults 8.304 Platform: linux-64 8.304 Collecting package metadata (repodata.json): ...working... done 47.07 Solving environment: ...working... done 48.40 48.40 ## Package Plan ## 48.40 48.40 environment location: /root/miniconda3/envs/cdvae 48.40 48.40 added / updated specs: 48.40 - python=3.9 48.40 48.40 48.40 The following packages will be downloaded: 48.40 48.40 package | build 48.40 ---------------------------|----------------- 48.40 pip-23.3.1 | py39h06a4308_0 2.6 MB 48.40 python-3.9.18 | h955ad1f_0 25.1 MB 48.40 setuptools-68.2.2 | py39h06a4308_0 948 KB 48.40 tzdata-2023d | h04d1e81_0 117 KB 48.40 wheel-0.41.2 | py39h06a4308_0 108 KB 48.40 ------------------------------------------------------------ 48.40 Total: 28.8 MB 48.40 48.40 The following NEW packages will be INSTALLED: 48.40 48.40 _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 48.40 _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 48.40 ca-certificates pkgs/main/linux-64::ca-certificates-2023.12.12-h06a4308_0 48.40 ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1 48.40 libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_0 48.40 libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 48.40 libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 48.40 libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 48.40 ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 48.40 openssl pkgs/main/linux-64::openssl-3.0.12-h7f8727e_0 48.40 pip pkgs/main/linux-64::pip-23.3.1-py39h06a4308_0 48.40 python pkgs/main/linux-64::python-3.9.18-h955ad1f_0 48.40 readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 48.40 setuptools pkgs/main/linux-64::setuptools-68.2.2-py39h06a4308_0 48.40 sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0 48.40 tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 48.40 tzdata pkgs/main/noarch::tzdata-2023d-h04d1e81_0 48.40 wheel pkgs/main/linux-64::wheel-0.41.2-py39h06a4308_0 48.40 xz pkgs/main/linux-64::xz-5.4.5-h5eee18b_0 48.40 zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 48.40 48.40 48.40 Proceed ([y]/n)? 72.65 72.65 Downloading and Extracting Packages: ...working... done 72.65 Preparing transaction: ...working... done 75.40 Verifying transaction: ...working... done 86.34 Executing transaction: ...working... done 118.0 # 118.0 # To activate this environment, use 118.0 # 118.0 # $ conda activate cdvae 118.0 # 118.0 # To deactivate an active environment, use 118.0 # 118.0 # $ conda deactivate 118.0 123.6 Channels: 123.6 - pytorch 123.6 - nvidia 123.6 - defaults 123.6 Platform: linux-64 123.6 Collecting package metadata (repodata.json): ...working... done 137.6 Solving environment: ...working... done 140.2 140.2 ## Package Plan ## 140.2 140.2 environment location: /root/miniconda3/envs/cdvae 140.2 140.2 added / updated specs: 140.2 - pytorch-cuda=11.6 140.2 - pytorch==1.13.0 140.2 - torchaudio==0.13.0 140.2 - torchvision==0.14.0 140.2 140.2 140.2 The following packages will be downloaded: 140.2 140.2 package | build 140.2 ---------------------------|----------------- 140.2 blas-1.0 | mkl 6 KB 140.2 brotli-python-1.0.9 | py39h6a678d5_7 330 KB 140.2 certifi-2023.11.17 | py39h06a4308_0 158 KB 140.2 cffi-1.16.0 | py39h5eee18b_0 251 KB 140.2 cryptography-41.0.7 | py39hdda0065_0 2.0 MB 140.2 cuda-11.6.1 | 0 1 KB nvidia 140.2 cuda-cccl-11.6.55 | hf6102b2_0 1.2 MB nvidia 140.2 cuda-command-line-tools-11.6.2| 0 1 KB nvidia 140.2 cuda-compiler-11.6.2 | 0 1 KB nvidia 140.2 cuda-cudart-11.6.55 | he381448_0 194 KB nvidia 140.2 cuda-cudart-dev-11.6.55 | h42ad0f4_0 1.0 MB nvidia 140.2 cuda-cuobjdump-11.6.124 | h2eeebcb_0 134 KB nvidia 140.2 cuda-cupti-11.6.124 | h86345e5_0 22.1 MB nvidia 140.2 cuda-cuxxfilt-11.6.124 | hecbf4f6_0 283 KB nvidia 140.2 cuda-driver-dev-11.6.55 | 0 16 KB nvidia 140.2 cuda-gdb-12.3.101 | 0 5.3 MB nvidia 140.2 cuda-libraries-11.6.1 | 0 1 KB nvidia 140.2 cuda-libraries-dev-11.6.1 | 0 2 KB nvidia 140.2 cuda-memcheck-11.8.86 | 0 168 KB nvidia 140.2 cuda-nsight-12.3.101 | 0 113.7 MB nvidia 140.2 cuda-nsight-compute-12.3.2 | 0 2 KB nvidia 140.2 cuda-nvcc-11.6.124 | hbba6d2d_0 42.2 MB nvidia 140.2 cuda-nvdisasm-12.3.101 | 0 47.9 MB nvidia 140.2 cuda-nvml-dev-11.6.55 | haa9ef22_0 65 KB nvidia 140.2 cuda-nvprof-12.3.101 | 0 4.8 MB nvidia 140.2 cuda-nvprune-11.6.124 | he22ec0a_0 65 KB nvidia 140.2 cuda-nvrtc-11.6.124 | h020bade_0 17.1 MB nvidia 140.2 cuda-nvrtc-dev-11.6.124 | h249d397_0 16.8 MB nvidia 140.2 cuda-nvtx-11.6.124 | h0630a44_0 58 KB nvidia 140.2 cuda-nvvp-12.3.101 | 0 114.5 MB nvidia 140.2 cuda-runtime-11.6.1 | 0 1 KB nvidia 140.2 cuda-samples-11.6.101 | h8efea70_0 5 KB nvidia 140.2 cuda-sanitizer-api-12.3.101| 0 17.1 MB nvidia 140.2 cuda-toolkit-11.6.1 | 0 1 KB nvidia 140.2 cuda-tools-11.6.1 | 0 1 KB nvidia 140.2 cuda-visual-tools-11.6.1 | 0 1 KB nvidia 140.2 ffmpeg-4.3 | hf484d3e_0 9.9 MB pytorch 140.2 freetype-2.12.1 | h4a9f257_0 626 KB 140.2 gds-tools-1.8.1.2 | 0 40.8 MB nvidia 140.2 giflib-5.2.1 | h5eee18b_3 80 KB 140.2 gmp-6.2.1 | h295c915_3 544 KB 140.2 gnutls-3.6.15 | he1e5248_0 1.0 MB 140.2 idna-3.4 | py39h06a4308_0 93 KB 140.2 intel-openmp-2023.1.0 | hdb19cb5_46306 17.2 MB 140.2 jpeg-9e | h5eee18b_1 262 KB 140.2 lame-3.100 | h7b6447c_0 323 KB 140.2 lcms2-2.12 | h3be6417_0 312 KB 140.2 lerc-3.0 | h295c915_0 196 KB 140.2 libcublas-11.9.2.110 | h5e84587_0 300.8 MB nvidia 140.2 libcublas-dev-11.9.2.110 | h5c901ab_0 310.9 MB nvidia 140.2 libcufft-10.7.1.112 | hf425ae0_0 93.6 MB nvidia 140.2 libcufft-dev-10.7.1.112 | ha5ce4c0_0 197.2 MB nvidia 140.2 libcufile-1.8.1.2 | 0 1.0 MB nvidia 140.2 libcufile-dev-1.8.1.2 | 0 15 KB nvidia 140.2 libcurand-10.3.4.107 | 0 51.8 MB nvidia 140.2 libcurand-dev-10.3.4.107 | 0 449 KB nvidia 140.2 libcusolver-11.3.4.124 | h33c3c4e_0 87.0 MB nvidia 140.2 libcusparse-11.7.2.124 | h7538f96_0 160.9 MB nvidia 140.2 libcusparse-dev-11.7.2.124 | hbbe9722_0 328.9 MB nvidia 140.2 libdeflate-1.17 | h5eee18b_1 64 KB 140.2 libiconv-1.16 | h7f8727e_2 736 KB 140.2 libidn2-2.3.4 | h5eee18b_0 146 KB 140.2 libnpp-11.6.3.124 | hd2722f0_0 118.4 MB nvidia 140.2 libnpp-dev-11.6.3.124 | h3c42840_0 115.6 MB nvidia 140.2 libnvjpeg-11.6.2.124 | hd473ad6_0 2.3 MB nvidia 140.2 libnvjpeg-dev-11.6.2.124 | hb5906b9_0 2.0 MB nvidia 140.2 libpng-1.6.39 | h5eee18b_0 304 KB 140.2 libtasn1-4.19.0 | h5eee18b_0 63 KB 140.2 libtiff-4.5.1 | h6a678d5_0 533 KB 140.2 libunistring-0.9.10 | h27cfd23_0 536 KB 140.2 libwebp-1.3.2 | h11a3e52_0 87 KB 140.2 libwebp-base-1.3.2 | h5eee18b_0 387 KB 140.2 mkl-2023.1.0 | h213fc3f_46344 171.5 MB 140.2 mkl-service-2.4.0 | py39h5eee18b_1 54 KB 140.2 mkl_fft-1.3.8 | py39h5eee18b_0 216 KB 140.2 mkl_random-1.2.4 | py39hdb19cb5_0 313 KB 140.2 nettle-3.7.3 | hbbd107a_1 809 KB 140.2 nsight-compute-2023.3.1.1 | 0 808.1 MB nvidia 140.2 numpy-1.26.3 | py39h5f9d8c6_0 10 KB 140.2 numpy-base-1.26.3 | py39hb5e798b_0 7.2 MB 140.2 openh264-2.1.1 | h4ff587b_0 711 KB 140.2 openjpeg-2.4.0 | h3ad879b_0 331 KB 140.2 pillow-10.0.1 | py39ha6cbd5a_0 745 KB 140.2 pyopenssl-23.2.0 | py39h06a4308_0 96 KB 140.2 pysocks-1.7.1 | py39h06a4308_0 31 KB 140.2 pytorch-1.13.0 |py3.9_cuda11.6_cudnn8.3.2_0 1.27 GB pytorch 140.2 pytorch-cuda-11.6 | h867d48c_1 3 KB pytorch 140.2 pytorch-mutex-1.0 | cuda 3 KB pytorch 140.2 requests-2.31.0 | py39h06a4308_0 96 KB 140.2 tbb-2021.8.0 | hdb19cb5_0 1.6 MB 140.2 torchaudio-0.13.0 | py39_cu116 6.5 MB pytorch 140.2 torchvision-0.14.0 | py39_cu116 7.6 MB pytorch 140.2 typing_extensions-4.9.0 | py39h06a4308_0 53 KB 140.2 urllib3-1.26.18 | py39h06a4308_0 198 KB 140.2 ------------------------------------------------------------ 140.2 Total: 4.46 GB 140.2 140.2 The following NEW packages will be INSTALLED: 140.2 140.2 blas pkgs/main/linux-64::blas-1.0-mkl 140.2 brotli-python pkgs/main/linux-64::brotli-python-1.0.9-py39h6a678d5_7 140.2 bzip2 pkgs/main/linux-64::bzip2-1.0.8-h7b6447c_0 140.2 certifi pkgs/main/linux-64::certifi-2023.11.17-py39h06a4308_0 140.2 cffi pkgs/main/linux-64::cffi-1.16.0-py39h5eee18b_0 140.2 charset-normalizer pkgs/main/noarch::charset-normalizer-2.0.4-pyhd3eb1b0_0 140.2 cryptography pkgs/main/linux-64::cryptography-41.0.7-py39hdda0065_0 140.2 cuda nvidia/linux-64::cuda-11.6.1-0 140.2 cuda-cccl nvidia/linux-64::cuda-cccl-11.6.55-hf6102b2_0 140.2 cuda-command-line~ nvidia/linux-64::cuda-command-line-tools-11.6.2-0 140.2 cuda-compiler nvidia/linux-64::cuda-compiler-11.6.2-0 140.2 cuda-cudart nvidia/linux-64::cuda-cudart-11.6.55-he381448_0 140.2 cuda-cudart-dev nvidia/linux-64::cuda-cudart-dev-11.6.55-h42ad0f4_0 140.2 cuda-cuobjdump nvidia/linux-64::cuda-cuobjdump-11.6.124-h2eeebcb_0 140.2 cuda-cupti nvidia/linux-64::cuda-cupti-11.6.124-h86345e5_0 140.2 cuda-cuxxfilt nvidia/linux-64::cuda-cuxxfilt-11.6.124-hecbf4f6_0 140.2 cuda-driver-dev nvidia/linux-64::cuda-driver-dev-11.6.55-0 140.2 cuda-gdb nvidia/linux-64::cuda-gdb-12.3.101-0 140.2 cuda-libraries nvidia/linux-64::cuda-libraries-11.6.1-0 140.2 cuda-libraries-dev nvidia/linux-64::cuda-libraries-dev-11.6.1-0 140.2 cuda-memcheck nvidia/linux-64::cuda-memcheck-11.8.86-0 140.2 cuda-nsight nvidia/linux-64::cuda-nsight-12.3.101-0 140.2 cuda-nsight-compu~ nvidia/linux-64::cuda-nsight-compute-12.3.2-0 140.2 cuda-nvcc nvidia/linux-64::cuda-nvcc-11.6.124-hbba6d2d_0 140.2 cuda-nvdisasm nvidia/linux-64::cuda-nvdisasm-12.3.101-0 140.2 cuda-nvml-dev nvidia/linux-64::cuda-nvml-dev-11.6.55-haa9ef22_0 140.2 cuda-nvprof nvidia/linux-64::cuda-nvprof-12.3.101-0 140.2 cuda-nvprune nvidia/linux-64::cuda-nvprune-11.6.124-he22ec0a_0 140.2 cuda-nvrtc nvidia/linux-64::cuda-nvrtc-11.6.124-h020bade_0 140.2 cuda-nvrtc-dev nvidia/linux-64::cuda-nvrtc-dev-11.6.124-h249d397_0 140.2 cuda-nvtx nvidia/linux-64::cuda-nvtx-11.6.124-h0630a44_0 140.2 cuda-nvvp nvidia/linux-64::cuda-nvvp-12.3.101-0 140.2 cuda-runtime nvidia/linux-64::cuda-runtime-11.6.1-0 140.2 cuda-samples nvidia/linux-64::cuda-samples-11.6.101-h8efea70_0 140.2 cuda-sanitizer-api nvidia/linux-64::cuda-sanitizer-api-12.3.101-0 140.2 cuda-toolkit nvidia/linux-64::cuda-toolkit-11.6.1-0 140.2 cuda-tools nvidia/linux-64::cuda-tools-11.6.1-0 140.2 cuda-visual-tools nvidia/linux-64::cuda-visual-tools-11.6.1-0 140.2 ffmpeg pytorch/linux-64::ffmpeg-4.3-hf484d3e_0 140.2 freetype pkgs/main/linux-64::freetype-2.12.1-h4a9f257_0 140.2 gds-tools nvidia/linux-64::gds-tools-1.8.1.2-0 140.2 giflib pkgs/main/linux-64::giflib-5.2.1-h5eee18b_3 140.2 gmp pkgs/main/linux-64::gmp-6.2.1-h295c915_3 140.2 gnutls pkgs/main/linux-64::gnutls-3.6.15-he1e5248_0 140.2 idna pkgs/main/linux-64::idna-3.4-py39h06a4308_0 140.2 intel-openmp pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46306 140.2 jpeg pkgs/main/linux-64::jpeg-9e-h5eee18b_1 140.2 lame pkgs/main/linux-64::lame-3.100-h7b6447c_0 140.2 lcms2 pkgs/main/linux-64::lcms2-2.12-h3be6417_0 140.2 lerc pkgs/main/linux-64::lerc-3.0-h295c915_0 140.2 libcublas nvidia/linux-64::libcublas-11.9.2.110-h5e84587_0 140.2 libcublas-dev nvidia/linux-64::libcublas-dev-11.9.2.110-h5c901ab_0 140.2 libcufft nvidia/linux-64::libcufft-10.7.1.112-hf425ae0_0 140.2 libcufft-dev nvidia/linux-64::libcufft-dev-10.7.1.112-ha5ce4c0_0 140.2 libcufile nvidia/linux-64::libcufile-1.8.1.2-0 140.2 libcufile-dev nvidia/linux-64::libcufile-dev-1.8.1.2-0 140.2 libcurand nvidia/linux-64::libcurand-10.3.4.107-0 140.2 libcurand-dev nvidia/linux-64::libcurand-dev-10.3.4.107-0 140.2 libcusolver nvidia/linux-64::libcusolver-11.3.4.124-h33c3c4e_0 140.2 libcusparse nvidia/linux-64::libcusparse-11.7.2.124-h7538f96_0 140.2 libcusparse-dev nvidia/linux-64::libcusparse-dev-11.7.2.124-hbbe9722_0 140.2 libdeflate pkgs/main/linux-64::libdeflate-1.17-h5eee18b_1 140.2 libiconv pkgs/main/linux-64::libiconv-1.16-h7f8727e_2 140.2 libidn2 pkgs/main/linux-64::libidn2-2.3.4-h5eee18b_0 140.2 libnpp nvidia/linux-64::libnpp-11.6.3.124-hd2722f0_0 140.2 libnpp-dev nvidia/linux-64::libnpp-dev-11.6.3.124-h3c42840_0 140.2 libnvjpeg nvidia/linux-64::libnvjpeg-11.6.2.124-hd473ad6_0 140.2 libnvjpeg-dev nvidia/linux-64::libnvjpeg-dev-11.6.2.124-hb5906b9_0 140.2 libpng pkgs/main/linux-64::libpng-1.6.39-h5eee18b_0 140.2 libtasn1 pkgs/main/linux-64::libtasn1-4.19.0-h5eee18b_0 140.2 libtiff pkgs/main/linux-64::libtiff-4.5.1-h6a678d5_0 140.2 libunistring pkgs/main/linux-64::libunistring-0.9.10-h27cfd23_0 140.2 libwebp pkgs/main/linux-64::libwebp-1.3.2-h11a3e52_0 140.2 libwebp-base pkgs/main/linux-64::libwebp-base-1.3.2-h5eee18b_0 140.2 lz4-c pkgs/main/linux-64::lz4-c-1.9.4-h6a678d5_0 140.2 mkl pkgs/main/linux-64::mkl-2023.1.0-h213fc3f_46344 140.2 mkl-service pkgs/main/linux-64::mkl-service-2.4.0-py39h5eee18b_1 140.2 mkl_fft pkgs/main/linux-64::mkl_fft-1.3.8-py39h5eee18b_0 140.2 mkl_random pkgs/main/linux-64::mkl_random-1.2.4-py39hdb19cb5_0 140.2 nettle pkgs/main/linux-64::nettle-3.7.3-hbbd107a_1 140.2 nsight-compute nvidia/linux-64::nsight-compute-2023.3.1.1-0 140.2 numpy pkgs/main/linux-64::numpy-1.26.3-py39h5f9d8c6_0 140.2 numpy-base pkgs/main/linux-64::numpy-base-1.26.3-py39hb5e798b_0 140.2 openh264 pkgs/main/linux-64::openh264-2.1.1-h4ff587b_0 140.2 openjpeg pkgs/main/linux-64::openjpeg-2.4.0-h3ad879b_0 140.2 pillow pkgs/main/linux-64::pillow-10.0.1-py39ha6cbd5a_0 140.2 pycparser pkgs/main/noarch::pycparser-2.21-pyhd3eb1b0_0 140.2 pyopenssl pkgs/main/linux-64::pyopenssl-23.2.0-py39h06a4308_0 140.2 pysocks pkgs/main/linux-64::pysocks-1.7.1-py39h06a4308_0 140.2 pytorch pytorch/linux-64::pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0 140.2 pytorch-cuda pytorch/noarch::pytorch-cuda-11.6-h867d48c_1 140.2 pytorch-mutex pytorch/noarch::pytorch-mutex-1.0-cuda 140.2 requests pkgs/main/linux-64::requests-2.31.0-py39h06a4308_0 140.2 tbb pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0 140.2 torchaudio pytorch/linux-64::torchaudio-0.13.0-py39_cu116 140.2 torchvision pytorch/linux-64::torchvision-0.14.0-py39_cu116 140.2 typing_extensions pkgs/main/linux-64::typing_extensions-4.9.0-py39h06a4308_0 140.2 urllib3 pkgs/main/linux-64::urllib3-1.26.18-py39h06a4308_0 140.2 zstd pkgs/main/linux-64::zstd-1.5.5-hc292b87_0 140.2 1754.4 1754.4 CondaError: Downloaded bytes did not match Content-Length 1754.4 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2 1754.4 target_path: /root/miniconda3/pkgs/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2 1754.4 Content-Length: 1367850928 1754.4 downloaded bytes: 1114384805 1754.4 1754.4 CondaError: Downloaded bytes did not match Content-Length 1754.4 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2 1754.4 target_path: /root/miniconda3/pkgs/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2 1754.4 Content-Length: 1367850928 1754.4 downloaded bytes: 1114384805 1754.4 1754.4 1754.4 1754.4 1754.4 Downloading and Extracting Packages: ...working... done

Dockerfile:37

36 | && rm -f Miniconda3-latest-Linux-${CONDA_ARCH}.sh 37 | >>> RUN conda init zsh \ 38 | >>> && . ~/.zshrc \ 39 | >>> && conda create --name cdvae python=3.9 \ 40 | >>> && conda activate cdvae \ 41 | >>> && conda install -y pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia \ 42 | >>> && conda install -c conda-forge pytorch-lightning \ 43 | >>> && conda install -c conda-forge ase autopep8 seaborn tqdm nglview \ 44 | >>> && pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu116.html \ 45 | >>> && pip install higher hydra-core==1.1.0 hydra-joblib-launcher==1.1.5 p-tqdm==1.3.3 pytest python-dotenv smact==2.2.1 streamlit==0.79.0 torchdiffeq wandb \ 46 | >>> && pip install matminer==0.7.3 \ 47 | >>> && pip install "protobuf==3.20.*" 48 |

ERROR: failed to solve: process "/bin/sh -c conda init zsh && . ~/.zshrc && conda create --name cdvae python=3.9 && conda activate cdvae && conda install -y pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia && conda install -c conda-forge pytorch-lightning && conda install -c conda-forge ase autopep8 seaborn tqdm nglview && pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu116.html && pip install higher hydra-core==1.1.0 hydra-joblib-launcher==1.1.5 p-tqdm==1.3.3 pytest python-dotenv smact==2.2.1 streamlit==0.79.0 torchdiffeq wandb && pip install matminer==0.7.3 && pip install \"protobuf==3.20.*\"" did not complete successfully: exit code: 1 make: *** [Makefile:2: build] Error 1

njkrichardson commented 9 months ago

The failure is from an inability to download Pytorch:

1754.4 CondaError: Downloaded bytes did not match Content-Length 1754.4 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2

This is a conda issue that I've sometimes seen happen if there is a network issue. Can you try to build again?

Kiencan commented 9 months ago

The failure is from an inability to download Pytorch:

1754.4 CondaError: Downloaded bytes did not match Content-Length 1754.4 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.13.0-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2

This is a conda issue that I've sometimes seen happen if there is a network issue. Can you try to build again?

Thank you. I think this build was successful. But I've got another problem when I run the next command "make run". it's the error: docker run -dt --gpus all -v "/home/kien/cdvae":"/cdvae" --name cdvae cdvae:latest /bin/zsh a5951997815fab30ce670133372d35a0716cd275f57a009f71342aecfec61041 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown. make: *** [Makefile:8: run] Error 125

njkrichardson commented 9 months ago

This is an issue with running the Nvidia container extension in WSL. I've not tested on Windows. Do you have access to a linux box? If not, what GPU are you using?

You can also look at some other Windows users having similar issues in threads like these:

Kiencan commented 9 months ago

This is an issue with running the Nvidia container extension in WSL. I've not tested on Windows. Do you have access to a linux box? If not, what GPU are you using?

You can also look at some other Windows users having similar issues in threads like these:

I'm using NVIDIA GeForce GTX 960M and Window 10. Can you tell me which OS you use and your GPU? Do you think a virtual machine such as Ubuntu version 20.04 would work?

njkrichardson commented 9 months ago

The 960M is a fairly old GPU, so you may want to check that it's compatible with CUDA 12.2 (for example, check this table). Otherwise you'll want to swap out the base image layer with a CUDA version that you have verified can run on your system.

I've built this image on MacOS and multiple Linux systems, but not on Windows. I've also built for Nvidia RTX 2080Ti and RTX 3080Ti. I would think a virtual machine would work, but I'm not very familiar with WSL, so I can't help you with that unfortunately.

Kiencan commented 9 months ago

The 960M is a fairly old GPU, so you may want to check that it's compatible with CUDA 12.2 (for example, check this table). Otherwise you'll want to swap out the base image layer with a CUDA version that you have verified can run on your system.

I've built this image on MacOS and multiple Linux systems, but not on Windows. I've also built for Nvidia RTX 2080Ti and RTX 3080Ti. I would think a virtual machine would work, but I'm not very familiar with WSL, so I can't help you with that unfortunately.

I've tried in another window and I got an issue when running this command: "$ python3 cdvae/run.py data=perov expname=perov". I've tried to add "HYDRA_FULL_ERROR=1 python3 cdvae/run.py data=perov expname=perov" but maybe it have some problem with CUDA. The error:

Sanity Checking: | | 0/? [00:00<?, ?it/s]/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_geometric/deprecation.py:22: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead warnings.warn(out) /root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument` to num_workers=11 in the DataLoader to improve performance. Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Error executing job with overrides: ['data=perov', 'expname=perov'] Traceback (most recent call last): File "/cdvae/cdvae/run.py", line 186, in main() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/main.py", line 49, in decorated_main _run_hydra( File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra run_and_report( File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/_internal/utils.py", line 214, in run_and_report raise ex File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report return func() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/_internal/utils.py", line 368, in lambda: hydra.run( File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/internal/hydra.py", line 110, in run = ret.return_value File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/core/utils.py", line 233, in return_value raise self._return_value File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/hydra/core/utils.py", line 160, in run_job ret.return_value = task_function(task_cfg) File "/cdvae/cdvae/run.py", line 182, in main run(cfg) File "/cdvae/cdvae/run.py", line 170, in run trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run results = self._run_stage() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage self._run_sanity_check() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check val_loop.run() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator return loop_run(self, *args, kwargs) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, step_args) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook output = fn(args, kwargs) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 403, in validation_step return self.lightning_module.validation_step(*args, kwargs) File "/cdvae/cdvae/pl_modules/model.py", line 625, in validation_step outputs = self(batch, teacher_forcing=False, training=False) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1208, in _call_impl result = forward_call(*input, kwargs) File "/cdvae/cdvae/pl_modules/model.py", line 369, in forward mu, log_var, z = self.encode(batch) File "/cdvae/cdvae/pl_modules/model.py", line 236, in encode hidden = self.encoder(batch) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1208, in _call_impl result = forward_call(*input, *kwargs) File "/cdvae/cdvae/plmodules/gnn.py", line 373, in forward , _, idx_i, idx_j, idx_k, idx_kj, idx_ji = self.triplets( File "/cdvae/cdvae/pl_modules/gnn.py", line 277, in triplets adj_t_row = adj_t[row] File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_sparse/tensor.py", line 662, in getitem out = out.index_select(dim, item) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_sparse/index_select.py", line 98, in SparseTensor.index_select = lambda self, dim, idx: index_select(self, dim, idx) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_sparse/index_select.py", line 15, in index_select old_rowptr, col, value = src.csr() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_sparse/tensor.py", line 237, in csr return self.storage.rowptr(), self.storage.col(), self.storage.value() File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch_sparse/storage.py", line 209, in rowptr rowptr = torch.ops.torch_sparse.ind2ptr(row, self._sparse_sizes[0]) File "/root/miniconda3/envs/cdvae/lib/python3.9/site-packages/torch/_ops.py", line 442, in call return self._op(args, kwargs or {}) RuntimeError: Not compiled with CUDA support

njkrichardson commented 9 months ago

Did you start the container with --nv so that CUDA libraries are visible within the container runtime?

Kiencan commented 9 months ago

Did you start the container with --nv so that CUDA libraries are visible within the container runtime?

Sorry, I do not understand. I've run your instruction command: make -> make run -> docker exec -it cdvae /bin/zsh -> conda init zsh -> zsh -> direnv allow -> cd /cdvae -> conda activate cdvae -> python3 cdvae/run.py data=perov expname=perov. So which step did I do wrong?