pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.34k stars 22.47k forks source link

[caffe2]build problem, can not find caffe2_pybind11_state_hip #9028

Open qimw opened 6 years ago

qimw commented 6 years ago

Issue description

I build the caffe2 with anaconda following the page. In the server with a single titanx, has cudnn7 and cuda9 but do not have nccl, so I download the nccl2 from nvidia and extract it to path/to/local/nccl2, and then edit the ./pytorch/conda/integrated/build.sh in the line 42 to be:"export NCCL_ROOT_DIR=path/to/local/nccl2". Then I need to use caffe2 with python2, so I added "conda_args+=(" --python 2.7") " in the ./pytorch/scripts/build_anaconda.sh to use python2.7. The building was succeed, but when I run python2 test.py from caffe2.python import core It tells me: WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: No module named caffe2_pybind11_state_hip Segmentation fault (core dumped)

My question is: a. why the conda does not support gpu? b. if I am using a single gpu, is nccl necessary for building? c. how to fix No module named caffe2_pybind11_state_hip

Thank you very much!

pjh5 commented 6 years ago

This warning WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: No module named caffe2_pybind11_state_hip does not actually indicate a problem. This will happen on a fully functional cpu-only build.

This is probably just a problem with the build system finding your CUDA installation. Where is CUDA installed on your machine?

Do you have a /usr/local/cuda that is a symlink to either /usr/local/cuda-8.0 or /usr/local/cuda-9.0 or something similar? What is env | grep CUDA What is nvcc --version and nvidia-smi Did you pass these flags " --install-locally --cuda 9.0 --cudnn 7" or your version equivalents to the scripts/build_anaconda.sh command?

lihao056 commented 6 years ago

I also find this problem and my CUDA like the photo image how could i solve this problem How you installed PyTorch (conda, pip, source): source PyTorch or Caffe2: caffe2 OS:ubuntu16.04 CUDA/cuDNN version:9.0/7.0.3 thank you

pjh5 commented 6 years ago

@celticssssss can you give the outputs that I asked the original author for? Could you also post your cmake output and all the commands you used to install Caffe2?

@qimw I just noticed that you're using conda. Note that the instructions say that to use scripts/build_anaconda.sh for GPU builds you have to pass "--cuda your-version --cudnn your-version" to the script.

qimw commented 6 years ago

@pjh5 yes, I have passed this option

pjh5 commented 6 years ago

@qimw Do you have a /usr/local/cuda that is a symlink to either /usr/local/cuda-8.0 or /usr/local/cuda-9.0 or something similar? What is env | grep CUDA What is nvcc --version and nvidia-smi Did you pass these flags " --install-locally --cuda 9.0 --cudnn 7" or your version equivalents to the scripts/build_anaconda.sh command? Can you post your cmake output?

lihao056 commented 6 years ago

@pjh5 I reinstall caffe2, and i find the same question 1531902809 1 and my installation procedure is follow https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile but i don't install with the GPU support 1531903074 1 i don't know whether my situation is result for this.

How you installed PyTorch (conda, pip, source): source PyTorch or Caffe2: caffe2 OS:ubuntu16.04 CUDA/cuDNN version:9.0/7.0.3

and my output is that 1531903426 1

How could i solve the problem? thank you

pjh5 commented 6 years ago

Can you post your cmake output https://caffe2.ai/docs/faq.html#what-is-the-cmake-output ? Can you try the points in https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2 to fix your import error? Make sure you are not running python in your Pytorch root directory.

lihao056 commented 6 years ago

thanks for you help。 my fault is not running python in root directory LOL Sorry,@pjh5 . my english is bad, and maybe cause you misunderstand. My mean is that I can't image this little thing cause the problem. and thank you very much for your help

pjh5 commented 6 years ago

@celticssssss I have never once seen this error caused by a bug on our end, so I have a very strong prior that something is wrong with your Python setup. Python setups can actually be quite tricky, and imo are very unintuitive and easy to mess up. Running python from the pytorch root directory is AFAICT the most common cause of errors like this.