oneapi-src / SYCLomatic

Other
227 stars 91 forks source link

No CUDA version found in CUDA PATH #1042

Closed leannmlindsey closed 1 year ago

leannmlindsey commented 1 year ago

Describe the bug

I am trying to install on a HPC unix system with common CUDA install paths and when I put in the full path to the cuda installation, it still can't find the cuda installation and it gives this error:

No CUDA version found in CUDA PATH: /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7

To Reproduce Please describe the steps to reproduce the behavior: During installation you state:

Note: Certain CUDA header files may need to be accessible to the tool. After build the SYCLomatic, you can run the list test by:

ninja check-clang-c2s

When I get to this point in the installation, I get this error:

No CUDA installation found in CUDA PATH: /usr/local/cuda Please set environment CUDA_PATH to correct path or make a symbolic link to "/usr/local/cuda"

I do not have permissions on this system to write into /usr/local so I can't use the symbolic link option, so I set the CUDA PATH by

export CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7

I got this path from my $PATH variable and I also tried

export CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/bin

export CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda

export CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include

with the same error every time.

Can you tell me what files it is looking for in /usr/local/cuda? Maybe then I can find where those are on my system .

Thank you LeAnn

tomflinda commented 1 year ago

@leannmlindsey I haven't got the point in your description. do you mean build SYCLomatic from source code by following the steps in https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md? It it is, you can build out SYCLomatic binaries by running "ninja install-c2s" successfully, and met problems in running lit test with command "ninja check-clang-c2s"?

leannmlindsey commented 1 year ago

Yes, that is correct. I didn't have any problems during the build and when I check it seems to be installed correctly

(SYCLomatic) lindsey@nid001409:/pscratch/sd/l/lindsey/SYCLomatic/build> c2s --version

dpct version 17.0.0. Codebase:(452139e9685b829b2705cca62b71656eef515773)

Similarly if I type

(SYCLomatic) lindsey@nid001409:/pscratch/sd/l/lindsey/SYCLomatic/build> ninja --help usage: ninja [options] [targets...]

if targets are unspecified, builds the 'default' target (see manual).

options: --version print ninja version ("1.11.1.git.kitware.jobserver-1") -v, --verbose show all command lines while building --quiet don't show progress status, just command output

-C DIR change to DIR before doing anything else -f FILE specify input build file [default=build.ninja]

-j N run N jobs in parallel (0 means infinity) [default=130 on this system] -k N keep going until N jobs fail (0 means infinity) [default=1] -l N do not start new jobs if the load average is greater than N -n dry run (don't run commands but act like they succeeded)

-d MODE enable debugging (use '-d list' to list modes) -t TOOL run a subtool (use '-t list' to list subtools) terminates toplevel options; further flags are passed to the tool -w FLAG adjust warnings (use '-w list' to list warnings)

Ninja seems properly installed

In our system, cuda is installed by the IT dept and is usually invoked with a "module load cuda"

So I can see the path to shared CUDA installation in the $PATH variable

(SYCLomatic) lindsey@nid001409:/pscratch/sd/l/lindsey/SYCLomatic/build> echo $PATH | tr ':' '\n' | grep 'cuda' /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/compute-sanitizer /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/bin /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/libnvvp

I tried several options of export CUDA_PATH but none of them seem to work. I always get the error message "No CUDA version found in CUDA PATH:

(SYCLomatic) lindsey@nid001409:/pscratch/sd/l/lindsey/SYCLomatic/build> ninja check-clang-c2s > error.out llvm-lit: /pscratch/sd/l/lindsey/SYCLomatic/SYCLomatic/llvm/utils/lit/lit/llvm/config.py:484: note: using clang: /pscratch/sd/l/lindsey/SYCLomatic/build/bin/clang No CUDA version found in CUDA PATH: /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7

I know that the Cuda installation works because I use it regularly, so I think whatever it is looking for in those folders must be missing. Maybe a particular version or something? Any ideas?

I am running on Perlmutter at Lawrence Berkeley National Labs. I noticed they do have a oneapi module but it doesn't seem to have SYCLomatic in it.

leannmlindsey commented 1 year ago

I tried the installation on two different HPC systems in addition to Perlmutter

University of Utah CHPC In this installation, after I set the CUDA_PATH I get a different error, I do not get the "No CUDA version found" error anymore

The new error is:

llvm-lit: /uufs/chpc.utah.edu/common/home/sundar-group2/LOC_ALIGN/workspace/SYCLomatic/llvm/utils/lit/lit/llvm/config.py:484: note: using clang: /uufs/chpc.utah.edu/common/home/sundar-group2/LOC_ALIGN/workspace/build/bin/clang 'nccl.h' header file not found in platform. Please make sure install the header file of NCCL and export nccl.h in CPATH.

I do not see anything in your instructions about needing to install nccl.h as a dependency, but I also searched your github and that file is not included and yet it is needed for several files to run.

I cloned the nccl github and followed the instructions to install and then added the path to nccl.h to CPATH

and then it finished but there were some errors:


UNSUPPORTED: Clang :: dpct/check-apis-report-windows.cu (663 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/dh_constant_db_win.cpp (664 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/math_functions_test_win.cu (665 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/hd_constant_db_win.cpp (666 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/kernel-call-complex_windows.cu (667 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/language_note_db_win.cpp (668 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/c_feature_file_db_win.c (669 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/template-deduce_windows.cu (670 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/test_path_in_windows.cu (671 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/test_helpapi_stats_with_replacetext_windows.cu (672 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/dh_d_constant_db_win.cpp (673 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/cuda-math-extension2.cu (674 of 676) Test requires the following unavailable features: cuda-8.0, v8.0


UNSUPPORTED: Clang :: dpct/dh_h_constant_db_win.cpp (675 of 676) Test does not support the following features and/or targets: system-linux


UNSUPPORTED: Clang :: dpct/dhh_constant_db_win.cpp (676 of 676) Test does not support the following features and/or targets: system-linux



Failed Tests (43): Clang :: dpct/LibCU/libcu_array.cu Clang :: dpct/cooperative_groups_block_tile_memory.cu Clang :: dpct/cub/devicelevel/device_unique_by_key.cu Clang :: dpct/cudaPointerAttributes.cu Clang :: dpct/cudnn-get-error-string.cu Clang :: dpct/cusparse-usm.cu Clang :: dpct/cusparse.cu Clang :: dpct/dnn/activation.cu Clang :: dpct/dnn/binary.cu Clang :: dpct/dnn/bnback.cu Clang :: dpct/dnn/bnbackex.cu Clang :: dpct/dnn/bninfer.cu Clang :: dpct/dnn/bntrain.cu Clang :: dpct/dnn/bntrainex.cu Clang :: dpct/dnn/convolution.cu Clang :: dpct/dnn/convolution_p2.cu Clang :: dpct/dnn/convolution_v7.cu Clang :: dpct/dnn/convolutionbackbias.cu Clang :: dpct/dnn/convolutionbackdata.cu Clang :: dpct/dnn/convolutionbackweight.cu Clang :: dpct/dnn/convolutionex.cu Clang :: dpct/dnn/dropout.cu Clang :: dpct/dnn/fill.cu Clang :: dpct/dnn/lrn.cu Clang :: dpct/dnn/memory.cu Clang :: dpct/dnn/normback.cu Clang :: dpct/dnn/norminfer.cu Clang :: dpct/dnn/normtrain.cu Clang :: dpct/dnn/pooling.cu Clang :: dpct/dnn/reduction.cu Clang :: dpct/dnn/reorder.cu Clang :: dpct/dnn/rnn.cu Clang :: dpct/dnn/scale.cu Clang :: dpct/dnn/semicolon.cu Clang :: dpct/dnn/softmax.cu Clang :: dpct/dnn/sum.cu Clang :: dpct/dnn/version.cu Clang :: dpct/grid_constant.cu Clang :: dpct/header_order/test.cu Clang :: dpct/manual_migrate_inroot/foo/api_is_inroot.cu Clang :: dpct/manual_migrate_not_inroot/foo/api_is_not_inroot.cu Clang :: dpct/nccl.cu Clang :: dpct/user_define_rule_header_order2.cu

Testing Time: 483.79s Unsupported: 35 Passed : 598 Failed : 43 FAILED: tools/clang/test/CMakeFiles/check-clang-c2s cd /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/LOC_ALIGN/workspace/build/tools/clang/test && /usr/bin/python3.9 /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/LOC_ALIGN/workspace/build/./bin/llvm-lit -vv -a --param USE_Z3_SOLVER=0 dpct ninja: build stopped: subcommand failed.

PSC Bridges2 Still running - I'll update when it is complete

leannmlindsey commented 1 year ago

When I compare the files within the two directories the only differences that I see are that the CHPC system (the one that worked) had NSIGHT compute folders in that directory as well as the cuda files.

CHPC (base) [u1323098@notchpeak2:cuda-11.6.2-hgkn7czv7ciyy3gtpazwk2s72msbw6l2]$ ls bin compute-sanitizer DOCS EULA.txt extras include lib64 libnvvp nsight-compute-2022.1.1 nsightee_plugins nsight-systems-2021.5.2 nvml nvvm pkgconfig README samples share src targets tools version.json

Perlmutter lindsey@nid002197:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7> ls bin compute-sanitizer DOCS EULA.txt extras include lib64 libnvvp nvml nvvm README share targets tools version.json

tomflinda commented 1 year ago

@leannmlindsey where are the CUDA SDK is installed in these two machines? If they are not installed in the default directory, like /usr/local/cuda-11.4, if the nccl.h and cudnn.h if not in the SDK head files, c2s will emit warning msg:

[154/155] Running lit suite /home/user/newdisk/monorepo/SYCLomatic/clang/test/dpct
llvm-lit: /home/user/newdisk/monorepo/SYCLomatic/llvm/utils/lit/lit/llvm/config.py:484: note: using clang: /home/user/newdisk/monorepo/release_build/bin/clang
/home/user/newdisk/monorepo/SYCLomatic/clang/test
'nccl.h' header file not found in platform. Please make sure install the header file of NCCL and export nccl.h in CPATH.

 Testing Time: 0.49s
  Unsupported: 676
user@user-ubuntu:~/newdisk/monorepo/release_build$

If CUDA SDK is not installed in the CUDA default directory, you should specify the environment CUDA_PATH to the customized CUDA SDK path, like "export CUDA_PATH=/customized/path/to/cuda-11.4/ ", then run the lit test of c2s.

tomflinda commented 1 year ago

Close this issue as no further feedback, @leannmlindsey pls reopen it if you still meet this issue and have further feedback. Thanks.