terralang / terra

Terra is a low-level system programming language that is embedded in and meta-programmed by the Lua programming language.
terralang.org
Other
2.71k stars 197 forks source link

9 cuda tests failed #586

Open yys123456 opened 2 years ago

yys123456 commented 2 years ago

I tried to build the terra from the source using CMake, and it was all good until running the terra test, 9 tests starting with cuda couldn't pass, but some other tests beginning with cuda like cudaprintf passed. image the environment on my computer: CUDA11.7, Visual Studio 17 2022, GTX1050ti, clang+llvm-11.1.0-x86_64-windows-msvc17

yys123456 commented 2 years ago

the commands used to build from source

  1. cmake -DCMAKE_INSTALL_PREFIX=./../install .. -DTERRA_ENABLE_CUDA=ON -G "Visual Studio 17 2022"
    D:\terra\build>cmake -DCMAKE_INSTALL_PREFIX=./../install .. -DTERRA_ENABLE_CUDA=ON -G "Visual Studio 17 2022"
    -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19043.
    -- The C compiler identification is MSVC 19.32.31332.0
    -- The CXX compiler identification is MSVC 19.32.31332.0
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64/x64/cl.exe - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Clang libraries: D:/LLVM/lib/clangFrontend.lib;D:/LLVM/lib/clangDriver.lib;D:/LLVM/lib/clangSerialization.lib;D:/LLVM/lib/clangCodeGen.lib;D:/LLVM/lib/clangParse.lib;D:/LLVM/lib/clangSema.lib;D:/LLVM/lib/clangAnalysis.lib;D:/LLVM/lib/clangEdit.lib;D:/LLVM/lib/clangAST.lib;D:/LLVM/lib/clangASTMatchers.lib;D:/LLVM/lib/clangLex.lib;D:/LLVM/lib/clangBasic.lib
    -- Found Clang: D:/LLVM/include
    -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.37.1.windows.1")
    -- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7 (found version "11.7")
    -- Using Lua: LuaJIT commit 50936d784474747b4569d988767f1b5bab8bb6d0
    -- Configuring done
    -- Generating done
    -- Build files have been written to: D:/terra/build
  2. cmake --build . --target INSTALL --config Release

image

elliottslaughter commented 2 years ago

Hi @yys123456,

I don't have a Windows dev box, so I'm going to need you to take the lead on fixing this. There may be a couple of other users hanging around the issue tracker who use Windows, but I don't know how many of them use CUDA on Windows.

What I can tell you is that CUDA on Linux is tested regularly. So whatever is going on here is specific to either (a) Windows, (b) GTX1050ti, or (c) something particular to your dev machine.

The parts of the build you've shown so far look fine. I think the next thing would be to look at the specific tests as see how they're failing.

P.S. If you don't mind, it would be nice to copy-and-paste the screenshots as text instead of images. Thanks.

sssphil commented 1 year ago

Hi @elliottslaughter, I've run into a similar situation on Ubuntu 20.04 with CUDA 11.7 except that I don't have cudaprintf.t but cudaoo.t in the list:

=================
= FAILING tests
=================
cudatest.t
cudashared.t
cudatex.t
cudaoffline.t
cudaoo.t
cudaaggregate.t
cudaatomic.t
cudaagg.t
cudaglobal.t
=================

I've compiled llvm from github release 16.0.4 along with clang and polly, and I've turned off the cmake flags following the instruction in this repo.

Running some of the test files alone gives this:

$ ../../../bin/terra cudaatomic.t 
<buffer>:1:10: fatal error: 'cuda_runtime.h' file not found
#include "cuda_runtime.h"
         ^~~~~~~~~~~~~~~~
compilation of included c code failed

stack traceback:
    [C]: in function 'registercfile'
    ...syang/workspace/dynamicfusion/terra_src/src/terralib.lua:3529: in function 'includecstring'
    cudaatomic.t:22: in main chunk

Could it be a problem with compiling llvm? I tried a pre-compiled release of llvm 13 from its repo and I remember the tests all passed. But I was having problem with some old optimization code so I'm compiling everything altogether

elliottslaughter commented 1 year ago

Where is your CUDA installed to? It's probably just missing the correct path. E.g., if your CUDA is installed to /usr/local/cuda-11, you could set:

export CUDA_HOME=/usr/local/cuda-11
sssphil commented 1 year ago

Thanks for the reply! I followed the instructions on NVIDIA's website, and $whereis CUDA shows /usr/local/cuda/. I set CUDA_HOME and tried again (also compiled with the variable) but it still shows the same error.

elliottslaughter commented 1 year ago

Try setting INCLUDE_PATH=$CUDA_HOME/include and see if that changes anything.

sssphil commented 1 year ago

Thanks for the help! Now all tests have passed. I'm wondering if I missed any steps while compiling from code. Or should I set up the environment variables when using terra?

elliottslaughter commented 1 year ago

No, it is not expected that you should need to set these variables. Something is going wrong.

Please got to src/terralib.lua at line 4345 and add the following debug prints:

print("CUDA_HOME", os.getenv("CUDA_HOME"))
print("terra.cudahome", terra.cudahome)
for k,v in pairs(terra.cudalibpaths) do
  print("terra.cudalibpaths", k, v)
end

Note that due to build glitches, you may need to make clean && make to see these change take effect.

sssphil commented 1 year ago

Hi @elliottslaughter, somehow I couldn't reproduce the problem anymore. I've been uninstalling and reinstalling CUDA and I've tried multiple versions and reverted to 11.7. Maybe it has something to do with how CUDA was installed? Here is the debug output from running cudatest.t, but I guess everything should be normal now.

CUDA_HOME   nil
terra.cudahome  /usr/local/cuda
terra.cudalibpaths  nvvm    /usr/local/cuda/nvvm/lib64/libnvvm.so
terra.cudalibpaths  runtime /usr/local/cuda/lib64/libcudart.so
terra.cudalibpaths  driver  libcuda.so