jade-hpc-gpu / jade-hpc-gpu.github.io

Joint Academic Data Science Endeavour (JADE) is the largest GPU facility in the UK supporting world-leading research in machine learning (and this is the repo that powers its website)
http://www.jade.ac.uk/
Other
24 stars 7 forks source link

Issue in pip-installing neural renderer and MPI-IS Mesh Processing Library #139

Open simofoti opened 4 years ago

simofoti commented 4 years ago

Virtual Environment Setup

Hi, I am trying to set up a virtual environment once logged in the login node. To do that, I created a very simple bash script. The part to reproduce the errors is the following:

#!/bin/bash
module load python3/3.6.3
module load cuda/10.1
git config --global https.proxy $https_proxy
git config --global http.proxy $http_proxy
virtualenv -p python3 ./ui2m_env
source ./ui2m_env/bin/activate
export CUDA=cu101
pip install torch==1.5.0+${CUDA} torchvision==0.6.0+${CUDA} -f https://download.pytorch.org/whl/torch_stable.html
pip install tb-nightly tqdm ray matplotlib imageio psutil moviepy ninja
# install neural renderer 
pip install git+https://github.com/daniilidis-group/neural_renderer.git
# install MPI-IS Mesh Processing Library
cd ./ui2m_env
git clone https://github.com/MPI-IS/mesh.git
cd mesh
make all
cd ../..

Two parts are failing: the installation of the neural renderer and of the MPI-IS Mesh Processing Library. The former fails because Found no NVIDIA driver on your system, the latter because of a permission denied error thrown by shutil.copytree.

Trying to understand how to fix the neural renderer problem I noticed a few things that might be part of the problem: nvidia-smi seems to be not installed. pytorch does not detect GPUs when checking if cuda is available (even thought the GPU version seems to be successfully installed) nvcc --version replies by saying that nvcc is not installed I assume this is normal in a login node, but I think this might also be the cause of the problem. Is there any workaround? Do you think that including this code in the run script might work?

In the attempt to fix the MPI-IS Mesh Processing problem I set a new default path for temporary directories to my home directory, but for some reason I still have the same issue. I also tried to change the Makefile performing the installation of the package with --user (line 7 of the Makefile), but this doesn't work either because in --user install mode the site-packages are not visible in the virtualenv.

Any guess on how to successfully install these libraries?

PS. I also tried to load the anaconda module and create a conda environment, but I have the exact same issues.

Thanks in advance :slightly_smiling_face:

LiamATOS commented 4 years ago

Hi,

Can you try running an interactive session on the devel node, the login nodes do not have GPU's or the NVIDIA Stack installed as they are for job submission only.

Thanks

Liam

simofoti commented 4 years ago

Hi @LiamATOS, thanks for getting back to me.

Just to be sure, is salloc --gres=gpu:1 --partition=devel the right way of accessing a devel node?

I tried to install the libraries from a devel node and the MPI-IS Mesh Processing problem is now solved. However, I still have errors when installing the neural renderer.

No CUDA runtime is found, using CUDA_HOME='/jmain01/apps/cuda/10.1'
.......
.......
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

Interestingly the installation of pytorch itself seems to be working. Is the neural renderer maybe finding the wrong drivers? I was trying to check the driver version, but nvidia-smi still doesn't work? Is it normal? Any idea on how to solve the problem?

The following script should be enough to reproduce the error:

#!/bin/bash
module load python3/3.6.3
module load cuda/10.1
git config --global https.proxy $https_proxy
git config --global http.proxy $http_proxy
virtualenv -p python3 ./ui2m_env
source ./ui2m_env/bin/activate
export CUDA=cu101
pip install torch==1.5.0+${CUDA} torchvision==0.6.0+${CUDA} -f https://download.pytorch.org/whl/torch_stable.html
pip install ninja
# install neural renderer 
pip install git+https://github.com/daniilidis-group/neural_renderer.git

Thanks in advance.