Closed Leeying1235 closed 6 years ago
Hi @Leeying1235 -- this looks like the bifrost code is having trouble finding your CUDA installation. Do you have cuda and a nvidia GPU installed? If so, I think it's probably an issue with libraries.
You may need to set some environment variables in your~/.bashrc
(or equivalent). Here's a snippet from mine:
# BIFROST
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/lib:$LD_LIBRARY_PATH"
export PATH=/usr/local/cuda-8.0/bin:$PATH
export BIFROST_INCLUDE_PATH=/usr/local/include/bifrost
# CUDA
export CUDA_ROOT="/usr/local/cuda-8.0"
export CUDA_INC_DIR="/usr/local/cuda-8.0/include"
There are also some common problems listed here: http://ledatelescope.github.io/bifrost/Common-installation-and-execution-problems.html
Please let me know if can figure it out, and I'll add some text to the common problems page.
Hi @telegraphic ,I have already installed cuda and there are nvidia GPUs in my computer. After i run command 'nvcc -V' and 'lspci |grep -i vga.*nvidia',i got this:
Command: nvcc -V The Output: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61 Command: lspci |grep -i vga.*nvidia The Output: 01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) 02:00.0 VGA compatible controller: NVIDIA Corporation GK110 [GeForce GTX 780] (rev a1) 03:00.0 VGA compatible controller: NVIDIA Corporation GK110 [GeForce GTX 780] (rev a1) 83:00.0 VGA compatible controller: NVIDIA Corporation GK110 [GeForce GTX 780] (rev a1) 84:00.0 VGA compatible controller: NVIDIA Corporation GK110 [GeForce GTX 780] (rev a1)
The environment variables PATH and LD_BIRARY_PATH has already set in my computer,same like yours,but it doesn't work.And follow your suggestions i tried to add these variables into ~/.bashrc,it doesn't work neither.And the below is value of variables PATH and LD_BIRARY_PATH on my computer: PATH /usr/local/lib:/usr/local/cuda-8.0/bin:/usr/local/cuda-8.0/bin:/home/ly/local/jdk1.8.0_45/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin LD_BIRARY_PATH /usr/local/cuda/lib64:/usr/local/lib:/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64
Looking forward to your reply! Thanks
What does nvidia-smi
yield?
@Leeying1235 Ok, my next guess is architecture -- I have not tested the code on anything older than a Maxwell card (980). Could you recompile with line 12 of user.mk
uncommented, commenting out line 14 instead?
https://github.com/ledatelescope/bifrost/blob/master/user.mk#L12
@telegraphic problem still exists.
@jaycedowell The output is below:
Wed Oct 10 09:01:00 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 780 Off | 0000:02:00.0 N/A | N/A | | 28% 49C P0 N/A / N/A | 0MiB / 3020MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 780 Off | 0000:03:00.0 N/A | N/A | | 26% 43C P0 N/A / N/A | 0MiB / 3020MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 780 Off | 0000:83:00.0 N/A | N/A | | 26% 43C P0 N/A / N/A | 0MiB / 3020MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 780 Off | 0000:84:00.0 N/A | N/A | | 0% 43C P0 N/A / N/A | 0MiB / 3020MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | | 2 Not Supported | | 3 Not Supported | +-----------------------------------------------------------------------------+
Can you run other CUDA codes, like something from the CUDA code samples library, without error?
@jaycedowell It installed fine though. I went into the samples and found 1_Utilities/deviceQuery to make a test. It compiled fine. Then, this happened: $ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 30 -> CUDA driver version is insufficient for CUDA runtime version Result = FAIL
It looks like there is some mis-match between the kernel driver version and what CUDA expects. If you can get this resolved and deviceQuery
to run without failing, I suspect that bifrost will stop throwing BF_STATUS_DEVICE_ERROR
on import.
@jaycedowell
I tried to reinstall cuda and run deviceQuery
without failing on another computer. But when i recompile bifrost to install, some other questions happened:
lw@lw-HP-ENVY-Notebook:~/desktop/bifrost-master$ sudo make -j
make -C src all
make[1]: Entering directory '/home/lw/desktop/bifrost-master/src'
/bin/sh: 1: nvcc: not found
/bin/sh: 1: cuobjdump: not found
/bin/sh: 1: nvcc: not foundinalg_kernels.cu
Makefile:163: recipe for target '_cuda_device_link.o' failed
make[1]: *** [_cuda_device_link.o] Error 127
make[1]: *** 正在等待未完成的任务....
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'guantize.o' failed
make[1]: *** [guantize.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'gunpack.o' failed
make[1]: *** [gunpack.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'transpose.o' failed
make[1]: *** [transpose.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'reduce.o' failed
make[1]: *** [reduce.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'fdmt.o' failed
make[1]: *** [fdmt.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'linalg.o' failed
make[1]: *** [linalg.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'linalg_kernels.o' failed
make[1]: *** [linalg_kernels.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'fir.o' failed
make[1]: *** [fir.o] Error 127
/bin/sh: 1: nvcc: not found
autodep.mk:56: recipe for target 'fft.o' failed
make[1]: *** [fft.o] Error 127
make[1]: Leaving directory '/home/lw/desktop/bifrost-master/src'
Makefile:15: recipe for target 'libbifrost' failed
make: *** [libbifrost] Error 2
This looks like a path problem. You probably need to set CUDA_HOME
in user.mk
to the installation path for CUDA and check your CUDA_ROOT
and CUDA_INC_DIR
environment variables.
@jaycedowell yes,you are right. I figured it out by adding variables CUDA_ROOT
and CUDA_INC_DIR
.
@jaycedowell @telegraphic I got the issue meets before solved by reinstall cuda
and cuda drivers
. I think the main reason is the mis-match of cuda drivers
and @jaycedowell you are right. @telegraphic @jaycedowell Thank you for your help! And i will close this issue.
Dear authors: I'm a beginner of your framework,i installed bifrost successfully.But when i am trying to import bifrost,i get _BF_STATUS_DEVICEERROR . The detail is below: `Python 2.7.6 (default, Nov 23 2017, 15:49:48) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.