Closed jd730 closed 1 year ago
Does anyone of export USE_NVRTC=1
& export USE_NVRTC=0
work? Seems like it is environmental problem (e.g. Multi CUDA version / ..), and it isn't likely to happen if CUDA + Pytorch are in a clean docker container.
Hi @ghostplant, Thank you for your fast response. If I set export USE_NVRTC=0
, it says
nvcc fatal : Unsupported gpu architecture 'compute_89'
Traceback (most recent call last):
File "test.py", line 6, in <module>
cumsum_tutel = fast_cumsum_sub_one(matrix, dim=0) + 1
File "/home/jdhwang/.local/lib/python3.8/site-packages/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one
return torch.ops.tutel_ops.cumsum(data)
File "/home/jdhwang/conda/envs/cl/lib/python3.8/site-packages/torch/_ops.py", line 502, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: (true) == (fp != nullptr) INTERNAL ASSERT FAILED at "/tmp/pip-req-build-c9h2prbs/tutel/custom/custom_kernel.cpp":49, please report a bug to PyTorch. CHECK_EQ fails.
I will try to test on clean env and try with cuda11.8 as well.
It works after upgrading torch (`2.0.1+cu11.8), nvcc and nccl. Thank you!
Hi,
I installed tutel via
python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
I am running a test scriptand facing error
following https://github.com/microsoft/tutel/issues/203, I exported
export USE_NVRTC=1
and I am using RTX4090 with torch ('2.0.0+cu117') and Cuda 11.7 (nvcc as well).