ralna / spral

Sparse Parallel Robust Algorithms Library
https://ralna.github.io/spral/
Other
108 stars 26 forks source link

NVCC include path not found #5

Closed Poofee closed 1 year ago

Poofee commented 7 years ago

On my Ubuntu16.04 system, when I run

./autogen.sh
sudo ./configure NVCC=nvcc NVCC_INCLUDE_FLAGS --with-metis="-L/usr/local/lib -lmetis" --with-blas="-L/usr/OpenBLAS -lopenblas"

The configure outputs "error NVCC include path not found". I am sure I have set the correct CUDA_HOME. I don't know much about the configure and makefile, so how can I fix it? Thank you!

jhogg41 commented 7 years ago

Check $CUDA_HOME/include/cuda_runtime_api.h exists and you have permission to access it (e.g. head $CUDA_HOME/include/cuda_runtime_api.h. If it does, check config.log and search for "cuda_runtime_api.h" and see if there's some unexpected nvcc error.

If you aren't able to figure out what's going wrong from that, please attach config.log here so the devs can take a look.

Poofee commented 7 years ago

$CUDA_HOME/include/cuda_runtime_api.h does exist. I have set the CUDA_HOME in both files /etc/profile and ~/.bashrc as follow:

CUDA_HOME=/usr/local/cuda
export CUDA_HOME
PATH=$PATH:$CUDA_HOME/bin
export PATH

I can use nvcc in my terminal. And I always use sudo to run the configure.

I am sorry. Because I am in China, Github cannot be accessed sometimes. I failed upload the file. Would you mind taking a look at here? Thank you! https://gist.github.com/poofee/54e53923df63aaafab6ffb621174e9be

Poofee commented 7 years ago

May I ask where is the file "conftest.c"? What's it? Now I have run out of every method I know. I log in the root account. And I run the script in the terminal. CUDA_HOME=/usr/local/cuda export CUDA_HOME PATH=$PATH:$CUDA_HOME/bin export PATH But it still tell me cannot find the NVCC include path? What's wrong? It's so wired.

Poofee commented 7 years ago

Finally, I have solved the problem. Since the configure process fails. So I changed the configure file. I added the line: spral_nvcc_inc_ok = yes so configure will be ok. But when I type make, the error still happens. so I try to edit the makefile. I find the file src/hw_topology/guess_topology.cxx using

include . I guess <> means the compiler will search it from default path.

So I run the command: echo | gcc -v -x c++ -E The path of cuda include isn't there. So I guess it is the key. In the command line, I type in: CPLUS_INCLUDE_PATH=/usr/local/cuda/include export CPLUS_INCLUDE_PATH And then when I redo the make, no error found. o my god! So my question is: Is it a bug or everyone should add the cuda include path in the cpp default path?

Poofee commented 7 years ago

Please recheck file spral/m4/spral_nvcc_lib.m4 , line 23: save_CPPFLAGS="$CCPFLAGS"; CPPFLAGS="$CPPFLAGS $NVCC_INCLUDE_FLAGS" Although I don't know much about configure, I think there is a mistake.

venovako commented 7 years ago

Hi poofee,

Please, can you try configuring as described below (sudo shouldn't be needed) in a clean state of the local repository:

./autogen.sh
./configure --with-metis="-L/usr/local/lib -lmetis" --with-blas="-L/usr/OpenBLAS -lopenblas"

That is, without specifying NVCC_INCLUDE_FLAGS at all.

First, please try to unset CPLUS_INCLUDE_PATH and run the configure as above.

If you still get errors, can you please set CPLUS_INCLUDE_PATH as you have described (to /usr/local/cuda/include) and then re-run configure as above.

Please let us know which option (with or without CPLUS_INCLUDE_PATH set) works for you - if any. If there are errors in both cases, the corresponding config.log files would be much appreciated.

Poofee commented 7 years ago

When I cloned the code, I have built as what you said. Everything went OK. But when I tried to use CUDA, the problems occur. I mean, when I use this: ./autogen.sh ./configure --with-metis="-L/usr/local/lib -lmetis" --with-blas="-L/usr/OpenBLAS -lopenblas" I can build the code successfully. However, the configure still cannot find the nvcc, this version doesn't use CUDA. Even if I can use nvcc command in my terminal. To be exact, the CPLUS_INCLUDE_PATH doesn't work no matter with or without. The C_INCLUDE_PATH works.

venovako commented 7 years ago

Thank you for the check!

Can you try the following:

  1. Edit Makefile.am, and uncomment the line 8 (remove the # sign before NVCCFLAGS).

  2. Depending on the GPU you have in your machine, set the appropriate -arch=sm_... flag in that line (roughly, sm_20 for Fermis, sm_30 or sm_35 or sm_37 for general Keplers, K40s, and K80s, respectively, and so on).

  3. Re-run configure, this time with explicitly setting NVCC:

    ./autogen.sh
    NVCC=nvcc ./configure --with-metis="-L/usr/local/lib -lmetis" --with-blas="-L/usr/OpenBLAS -lopenblas"

Do you get CUDA recognized and working now?

Poofee commented 7 years ago

No. It still tells me NVCC include path not found unless I set the C_INCLUDE_PATH at first. By the way, as I have mentioned, in file spral/m4/spral_nvcc_lib.m4 , line 23: save_CPPFLAGS="$CCPFLAGS"; CPPFLAGS="$CPPFLAGS $NVCC_INCLUDE_FLAGS" Isn't the CCPFLAGS a spell mistake?

venovako commented 7 years ago

Yes, it looks like a typo, thank you! The maintainers, could you please fix this?

If you change CCPFLAGS to CPPFLAGS and reconfigure, do you get CUDA working?

If so, can you unset C_INCLUDE_FLAGS and still get everything configured and built?

Poofee commented 7 years ago

No. I guess there is something wrong with the configure file. Thank you for your time. Since I don't know much about the configure file and I have successfully built the library using C_INCLUDE_FLAGS, I could skip the problem. Thank you! I will keep watching what's going on.

venovako commented 7 years ago

OK, I'll post here if I figure out what is happening. My environment has C_INCLUDE_FLAGS set by the modules system, so I'll try to unset it and configure without it.

venovako commented 7 years ago

I can now confirm, that even with the typo fixed, configure fails when C_INCLUDE_FLAGS in not set to a directory containing CUDA includes:

configure: error: NVCC include path not found

Thank you again for reporting this!

Poofee commented 7 years ago

It's good to confirm the problem. I really learn a lot.

archenroot commented 7 years ago

Hi guys, I am trying to package spral into Gentoo ebuild system, but having this CUDA related issue.

It is failing in configure part, same message as here discussed. My Gentoo ebuild is quite simple:

src_prepare() {
    default
    WANT_AUTOCONF=2.5 eautoreconf
    WANT_AUTOMAKE=1.9 eautomake
}

src_configure() {
    local myeconfargs=(
        BLAS_LIBS=$(pkg-config --libs-only-l blas)
        LAPACK_LIBS=$(pkg-config --libs-only-l lapack)
        C_INCLUDE_FLAGS="/opt/cuda/include"
        NVCC_INCLUDE_FLAGS="/opt/cuda/include"
    )
    econf "${myeconfargs[@]}"
}

It fails on the part src_configure, any suggestion what I am doing wrong here? Thanks a lot.

archenroot commented 7 years ago

My generated configure command looks like: /configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/spral-9999 --htmldir=/usr/share/doc/spral-9999/html --libdir=/usr/lib64 BLAS_LIBS=-lf77blas LAPACK_LIBS=-lreflapack -lf77blas C_INCLUDE_FLAGS=/opt/cuda/include NVCC_INCLUDE_FLAGS=/opt/cuda/include

So it is already too late, I understand that this should be set before ./configure execution.

archenroot commented 7 years ago

Where exactly I should set this variable = in Makefile.in?

venovako commented 7 years ago

@archenroot:

Are you using the latest sources from master?

If so, can you try not setting C_INCLUDE_FLAGS and NVCC_INCLUDE_FLAGS in your ebuild (i.e., not put them in myeconfargs)?

What do you get?

archenroot commented 7 years ago

Thanks for quick response, so if I go just by this:

src_prepare() {
    default
    WANT_AUTOCONF=2.5 eautoreconf
    WANT_AUTOMAKE=1.9 eautomake
}

src_configure() {
    local myeconfargs=(
        BLAS_LIBS=$(pkg-config --libs-only-l blas)
        LAPACK_LIBS=$(pkg-config --libs-only-l lapack)
    )
    econf "${myeconfargs[@]}"
}

src_install() {
    emake
}

So this generates following ./configure command: ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/spral-9999 --htmldir=/usr/share/doc/spral-9999/html --libdir=/usr/lib64 BLAS_LIBS=-lf77blas LAPACK_LIBS=-lreflapack -lf77blas

I still have issue with NVCC, the nvcc itself is found as first lines, but at the end the INCLUDE detect mechanism doesn't work well:

checking for nvcc... nvcc
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
cuInit: 999
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to get verbose linking output from x86_64-pc-linux-gnu-gfortran... -v
checking for Fortran libraries of x86_64-pc-linux-gnu-gfortran...  -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../.. -lgfortran -lm -lquadmath
checking flags to link C main with x86_64-pc-linux-gnu-gfortran... none
checking for std::align... no
checking for sched_getcpu()... yes
checking how to get verbose linking output from x86_64-pc-linux-gnu-gfortran... -v
checking for Fortran 77 libraries of x86_64-pc-linux-gnu-gfortran...  -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/../../.. -lgfortran -lm -lquadmath
checking for dummy main to link with Fortran 77 libraries... none
checking for Fortran 77 name-mangling scheme... lower case, underscore, no extra underscore
checking for sgemm_ in -lf77blas ... yes
checking for cheev_ in -lreflapack -lf77blas ... yes
checking for METIS library... checking for metis_nodend_ in -L -lmetis... no
checking for metis_nodend_ in -lmetis... yes
checking version of METIS... "version 4"
checking for x86_64-pc-linux-gnu-pkg-config... /usr/bin/x86_64-pc-linux-gnu-pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for HWLOC... no
configure: WARNING: hwloc not supplied: cannot detect NUMA regions
checking how to run the C preprocessor... x86_64-pc-linux-gnu-gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking cuda_runtime_api.h usability... no
checking cuda_runtime_api.h presence... no
checking for cuda_runtime_api.h... no
checking for cuda_runtime_api.h... (cached) no
configure: error: NVCC include path not found
venovako commented 7 years ago

You have a very interesting CUDA environment, it seems.

If you look at line 3 of the configure's output, it says: cuInit: 999 That output in turn must have been produced by nvcc_arch_sm.c program. If you look there, and here: http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE you will see that having such a return value (999) from cuInit function is at least not documented, if not impossible.

So, something strange is going on here. Can you compile and run CUDA samples, e.g., on the machine you're testing this procedure?

archenroot commented 7 years ago

:-) Yes, good catch, I am on following versions:

*  x11-drivers/nvidia-drivers
      Latest version available: 381.22
      Latest version installed: 381.22
*  dev-util/nvidia-cuda-toolkit
      Latest version available: 8.0.61
      Latest version installed: 8.0.61
*  dev-util/nvidia-cuda-sdk
      Latest version available: 8.0.61
      Latest version installed: 8.0.61

But when I try to compile following piece of code:

#include <stdio.h>
#include <dlfcn.h>
int main() {
  void *cudalib = dlopen("libcuda.so", RTLD_NOW);
  int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" );
  int retval = (*__cuInit)(0);
  printf("%d", retval);
}

I get 0, looks to me like cude works fine.

archenroot commented 7 years ago

But I will try to examine, true is that in Gentoo you can switch which GPU chip is used for OpenGL, it was switched to run on intel graphics, now I switched it to Nvidia and I got additional message:

nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
 * ACCESS DENIED:  open_wr:      /dev/nvidia-uvm
 * ACCESS DENIED:  open_wr:      /dev/nvidia-uvm

Normally OpenGL could be switched to Intel cpu gpu and I can still use CUDA externally...

I will re-install/compile those 3 packages again and give it another try...

archenroot commented 7 years ago

Ok, I shouldn't be doing the openGL target switch, which somehow harmed my nvidia device as ACCESS DENIED to /dev/nvidia-uvm.

Anyway I reinstalled the packages, the GCC itself for sure (I have multi GCC experimental machine), in this case I work with 5.4.0-r3.

Now I executed following more advanced Hello world for CUDA:

#include <stdio.h>

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

const int N = 16;
const int blocksize = 16;

__global__
void hello(char *a, int *b)
{
  a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
  char a[N] = "Hello \0\0\0\0\0\0";
  int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

  char *ad;
  int *bd;
  const int csize = N*sizeof(char);
  const int isize = N*sizeof(int);

  printf("%s", a);

  cudaMalloc( (void**)&ad, csize );
  cudaMalloc( (void**)&bd, isize );
  cudaCheckErrors("cudaMalloc fail");
  cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
  cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
  cudaCheckErrors("cudaMemcpy H2D fail");

  dim3 dimBlock( blocksize, 1 );
  dim3 dimGrid( 1, 1 );
  hello<<<dimGrid, dimBlock>>>(ad, bd);
  cudaCheckErrors("Kernel fail");
  cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
  cudaCheckErrors("cudaMemcpy D2H/Kernel fail");
  cudaFree( ad );
  cudaFree( bd );

  printf("%s\n", a);
  return EXIT_SUCCESS;
}

I get following:

zangetsu@ares ~ $ ./hello_world_cuda 
Hello World!
zangetsu@ares ~ $ cuda-memcheck ./hello_world_cuda 
========= CUDA-MEMCHECK
Hello World!
========= ERROR SUMMARY: 0 errors

I additionally tried to query the device with toolkit query utility:

zangetsu@ares ~ $ /opt/cuda/sdk/bin/x86_64/linux/release/deviceQueryDrv
/opt/cuda/sdk/bin/x86_64/linux/release/deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960M"
  CUDA Driver Version:                           8.0
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2003 MBytes (2100232192 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1098 MHz (1.10 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Max Texture Dimension Sizes                    1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z):    (2147483647, 65535, 65535)
  Texture alignment:                             512 bytes
  Maximum memory pitch:                          2147483647 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS

So seems like CUDA works for me. Still when I tried to recompile now the package I get the same error. Also note I have multiple packages installed with CUDA support enable which work...

venovako commented 7 years ago

I don't know what you are trying to achieve with this, but your code cannot work as you've said:

  char a[N] = "Hello \0\0\0\0\0\0";
  int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

In the kernel, you are adding to a elements of b, in the same order. So, you would add 15 to 'H'. How do you expect to have 'H' in the output?

I don't have to compile the code to see that.

archenroot commented 7 years ago

This is just dummy CUDA test, and the output is:

zangetsu@ares ~ $ ./hello_world_cuda 
Hello World!
zangetsu@ares ~ $ cuda-memcheck ./hello_world_cuda 
========= CUDA-MEMCHECK
Hello World!
========= ERROR SUMMARY: 0 errors

The code works, you should compile it :dagger:

archenroot commented 7 years ago

Any idea? CUDA works on the machine... thanks for any hint...

jfowkes commented 1 year ago

Closing as outdated, please open a new issue if you still have problems.