taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.51k stars 2.28k forks source link

Correct PyTorch version required for CUDA tests #2969

Open qiao-bo opened 3 years ago

qiao-bo commented 3 years ago

Describe the bug The Torch package in requirements_dev.txt is installed from PyPI which is torch 1.9.0 (at this moment). It does not have CUDA support for the latest Ampere architecture. For example, on a RTX 3080 you will get the following error:

/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:106: UserWarning: 
NVIDIA GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

If you have performed a ti test, you will likely get the following error:

$ ti test -a cuda -t2 -v
======================================================== short test summary info =========================================================
FAILED python/taichi/tests/test_get_external_tensor_shape.py::test_get_external_tensor_shape_access_ndarray[size0] - RuntimeError: CUDA...
FAILED python/taichi/tests/test_ndarray.py::test_ndarray_2d - RuntimeError: CUDA error: no kernel image is available for execution on t...
FAILED python/taichi/tests/test_ndarray.py::test_ndarray_numpy_io - RuntimeError: CUDA error: no kernel image is available for executio...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_python_scope[Layout.SOA] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_python_scope[Layout.AOS] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_taichi_scope[Layout.SOA] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_taichi_scope[Layout.AOS] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_taichi_scope_struct_for[Layout.SOA] - RuntimeError: CUDA error: no kern...
FAILED python/taichi/tests/test_ndarray.py::test_matrix_ndarray_taichi_scope_struct_for[Layout.AOS] - RuntimeError: CUDA error: no kern...
FAILED python/taichi/tests/test_ndarray.py::test_vector_ndarray_python_scope[Layout.SOA] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_vector_ndarray_python_scope[Layout.AOS] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_vector_ndarray_taichi_scope[Layout.SOA] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_vector_ndarray_taichi_scope[Layout.AOS] - RuntimeError: CUDA error: no kernel image is...
FAILED python/taichi/tests/test_ndarray.py::test_compiled_functions - RuntimeError: CUDA error: no kernel image is available for execut...
FAILED python/taichi/tests/test_mpm88.py::test_mpm88_numpy_and_ndarray - RuntimeError: CUDA error: no kernel image is available for exe...
FAILED python/taichi/tests/test_torch_ad.py::test_torch_ad_gpu - RuntimeError: CUDA error: no kernel image is available for execution o...
=================================== 16 failed, 947 passed, 2 skipped, 10 warnings in 880.13s (0:14:40) ===================================

How To Circumvent? Just Follow the official instruction from pytorch:

python3 -m pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Additional comments

k-ye commented 3 years ago

Thanks! I think we can run the tests without the test_ndarray.py to see whether we need the update

qiao-bo commented 3 years ago

Thanks! I think we can run the tests without the test_ndarray.py to see whether we need the update

there are three other failed tests also related to torch GPU (see the log above). But i guess they can be skipped as well, at least for the moment.