microsoft / MeshTransformer

Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"
https://arxiv.org/abs/2012.09760
MIT License
614 stars 95 forks source link

There is a nvcc error when I install apex #50

Open JuntingLee opened 2 years ago

JuntingLee commented 2 years ago

I ran the commond: python setup.py install --cuda_ext --cpp_ext and got


torch.__version__  = 1.4.0

setup.py:107: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
from /usr/local/cuda-10.1/bin

running install
running bdist_egg
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'fused_layer_norm_cuda' extension
gcc -pthread -B /home/bob/anaconda3/envs/metro/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/TH -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/home/bob/anaconda3/envs/metro/include/python3.7m -c csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-10.1/bin/nvcc -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/TH -I/home/bob/anaconda3/envs/metro/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/home/bob/anaconda3/envs/metro/include/python3.7m -c csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -maxrregcount=50 -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++11
csrc/layer_norm_cuda_kernel.cu:4:10: fatal error: ATen/cuda/DeviceUtils.cuh: No such file or directory
 #include "ATen/cuda/DeviceUtils.cuh"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/local/cuda-10.1/bin/nvcc' failed with exit status 1

I have installed CUDA10.1, ran commond: which nvcc and I got /usr/local/cuda-10.1/bin/nvcc

kevinlin311tw commented 2 years ago

I am not sure if NVIDIA Apex has any updates recently.

One suggestion is to skip apex installation. For some reasons, we observed mix-precision training is somehow slow. We think probably there are issues when running PyTorch1.4 with Apex.

amogh112 commented 2 years ago

Hey, you need to go to previous version of apex to install because of changes in PyTorch as given here : https://issueexplorer.com/issue/NVIDIA/apex/1200 git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0