Closed hyhuang00 closed 2 years ago
Your CUDA environment seems not be installed in the default location(e.g. /usr/local/cuda/include) can you print the value of CUDA_HOME. BTW, you can also try whether export USE_NVRTC=0
will help.
Your CUDA environment seems not be installed in the default location(e.g. /usr/local/cuda/include) can you print the value of CUDA_HOME. BTW, you can also try whether
export USE_NVRTC=0
will help.
Thank you for your prompt reply! Yes, my CUDA environment is not installed in the default location because I'm using a shared computation cluster. Is there a parameter I can fix to ensure the compiler can find the correct CUDA? I will try to use export USE_NVRTC=0
$ echo $CUDA_HOME
/public/apps/cuda/11.3
We just merge a PR that parse CUDA_HOME from environment variable. Can you try whether it works for you?
Thank you! The fix works and I think the CUDA_HOME can be found correctly. I can now successfully run the hello_world.py
and hello_world_ddp.py
under the examples folder without any error. However, when I tried to use it under the fairseq (the use case is here), I got the following two errors:
Compilation error: (seems like the CUDA compiler worked, otherwise there won't be the second error)
[W custom_kernel.cpp:158] nvrtc: error: unrecognized option --includ`��.�U found
� Failed to use NVRTC for JIT compilation in this Pytorch version, try another approach using CUDA compiler.. (To always disable NVRTC, please: export USE_NVRTC=0)
RuntimeError
File "/private/home/hyhuang/.conda/envs/newnllb/lib/python3.9/site-packages/fairseq-1.0.0a0+b1b3eda-py3.9-linux-x86_64.egg/fairseq/modules/moe/top2gate.py", line 234, in top2gating
locations1 = fused_cumsum_sub_one(mask1)
File "/private/home/hyhuang/.local/lib/python3.9/site-packages/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one return torch.ops.tutel_ops.cumsum(data) RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values))INTERNAL ASSERT FAILED at "/tmp/pip-req-build-djl73tcc/tutel/custom/custom_kernel.cpp":214, please report a bug to PyTorch. CHECK_EQ fails. return torch.ops.tutel_ops.cumsum(data)
Would you be able to provide any suggestions between these two? I am so confused. This is the same environment I used to run the `hello_world.py` scripts.
You need to run unset USE_NVRTC
since you may explicitly configure that variable before.
Thank you! That completely resolves this problem. Closing the issue.
@hyhuang00 Can you help us to test whether the latest version (#170) still work for your environment? As we canceled the way to detect manual CUDA_HOME
environment variable, but the new way should be compatible with different environment more robustly.
Sure, I'm happy to help. Let me try out the new version and I'll let you know if it works for me.
The new version works on my machine without any error. I installed the package via $ python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
Thanks!
I have installed tutel on my machine and have set up the related environment variables, such as the $CUDA_HOME and $CFLAGS. However, when I try to run examples/hello_world.py, I got the following error:
[E custom_kernel.cpp:124] default_program(1): catastrophic error: cannot open source file "cuda_runtime.h"
1 catastrophic error detected in the compilation of "default_program". Compilation terminated. Failed to use NVRTC for JIT compilation in this Pytorch version, try another approach using CUDA compiler.. (To always disable NVRTC, please: export USE_NVRTC=0)
File "/private/home/hyhuang/.local/lib/python3.9/site-packages/tutel/impls/jit_compiler.py", line 26, in func tutel_custom_kernel.invoke(inputs, ctx) RuntimeError: (true) == (fp != nullptr)INTERNAL ASSERT FAILED at "/tmp/pip-req-build-pcbbciia/tutel/custom/custom_kernel.cpp":40, please report a bug to PyTorch. CHECK_EQ fails.
I am using PyTorch 1.10.1 + CUDA 11.3. Is there any other parameter I should fix to use tutel?