Upgraded from 0.6.37 to 0.7.12 and it broke ti.GPU

JoshWolper commented 3 years ago

Describe the bug When I try to run code on GPU it no longer works. These files worked perfectly on GPU before the upgrade.

To Reproduce This is the same code file from my other issue about the indexing, now the error that comes up is related to GPU though...

import taichi as ti
import numpy as np
import triangle as tr

ti.init(default_fp=ti.f64, arch=ti.gpu) # Try to run on GPU    #GPU, parallel
#ti.init(default_fp=ti.f64, arch=ti.cpu, cpu_max_num_threads=1, debug=True)  #CPU, sequential

#Sparse Grids
#---Params
dim = 3
dx = 0.003
rp = (3*(dx**2))**0.5
invDx = 1.0 / 0.003
nGrid = ti.ceil(invDx)
grid_size = 4096 
grid_block_size = 128
leaf_block_size = 16 if dim == 2 else 8
indices = ti.ij if dim == 2 else ti.ijk
offset = tuple(0 for _ in range(dim))
#---Grid Shapes for PID
grid = ti.root.pointer(indices, grid_size // grid_block_size) # 32
block = grid.pointer(indices, grid_block_size // leaf_block_size) # 8
pid = ti.field(int)
block.dynamic(ti.indices(dim), 1024 * 1024, chunk_size=leaf_block_size**dim * 8).place(pid, offset=offset + (0, ))
#---Grid Shapes for Rest of Grid Structures
grid2 = ti.root.pointer(indices, grid_size // grid_block_size) # 32
block2 = grid2.pointer(indices, grid_block_size // leaf_block_size) # 8
def block_component(c):
    block2.dense(indices, leaf_block_size).place(c, offset=offset) # 16 in 3D, 8 in 2D (-2048, 2048) or (0, 4096) w/o offset

#Grid Structures
gridNumParticles = ti.field(dtype=int)      #track number of particles in each cell using cell index
maxPPC = 2**10
block_component(gridNumParticles) #keep track of how many particles are at each cell of backGrid

#Densely adding this structure
# gridNumIndeces = ti.ij if dim == 2 else ti.ijk
# gridNumShape = (nGrid, nGrid) if dim == 2 else (nGrid, nGrid, nGrid)
# ti.root.dense(gridNumIndeces, gridNumShape).place(gridNumParticles)

backGrid = ti.field(int)              #background grid to map grid cells to a list of particles they contain
backGridIndeces = ti.ijk if dim == 2 else ti.ijkl
backGridShape = (nGrid, nGrid, maxPPC) if dim == 2 else (nGrid, nGrid, nGrid, maxPPC)
ti.root.dense(backGridIndeces, backGridShape).place(backGrid)      #backGrid is nGrid x nGrid x maxPPC

#Particle Structures
x = ti.Vector.field(dim, dtype=float) # position
mp = ti.field(dtype=float) # particle masses
particle = ti.root.dynamic(ti.i, 2**27, 2**19) #2**20 causes problems in CUDA (maybe asking for too much space)
particle.place(x, mp)

#Neighbor Search Routines
@ti.func
def backGridIdx(x):
    #compute int vector of backGrid indeces (recall the grid here )
    return int(x/rp)

@ti.func
def stencil_range():
    return ti.ndrange(*((3, ) * dim))

def addParticles2D():
    N = 10
    w = 0.2
    dw = w / N
    for i in range(N):
        for j in range(N):
            x[i*10 + j] = [0.4 + (i * dw), 0.4 + (j * dw)]
            mp[i*10 + j] = 0.001

def addParticles3D():
    N = 10
    w = 0.2
    dw = w / N
    for i in range(N):
        for j in range(N):
            for k in range(N):
                x[i*(N**2) + j*N + k] = [0.4 + (i * dw), 0.4 + (j * dw), 0.4 + (k * dw)]
                mp[i*(N**2) + j*N + k] = 0.001

@ti.kernel
def build_pid():
    ti.block_dim(64) #this sets the number of threads per block / block dimension
    for p in x:
        base = int(ti.floor(x[p] * invDx - 0.5))
        ti.append(pid.parent(), base - ti.Vector(list(offset)), p)

@ti.kernel
def backGridSort():
    #Sort particles into backGrid
    ti.block_dim(256)
    ti.no_activate(particle)
    for I in ti.grouped(pid):
        p = pid[I]
        cell = backGridIdx(x[p]) #grab cell idx (vector of ints)
        offs = ti.atomic_add(gridNumParticles[cell], 1) #atomically add one to our grid cell's particle count NOTE: returns the OLD value before add
        print("cell:", cell, "offs:", offs)
        print("backGrid shape:", backGrid.shape)
        backGrid[cell, offs] = p #place particle idx into the grid cell bucket at the correct place in the cell's neighbor list (using offs)

if dim == 2:
    addParticles2D()
elif dim == 3:
    addParticles3D()
grid.deactivate_all()
build_pid()
backGridSort()

Log/Screenshots Please post the full log of the program (instead of just a few lines around the error message, unless the log is > 1000 lines). This will help us diagnose what's happening. For example:

josh@jg9:~/Desktop/FastIPC/projects/brittle/referenceCode$ python3 testingBackGrid.py 
[Taichi] mode=release
[Taichi] preparing sandbox at /tmp/taichi-6llammn1
[Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9
[Taichi] Starting on arch=cuda
[E 01/12/21 11:58:29.530] [cuda_driver.h:operator()@80] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit)

***********************************
* Taichi Compiler Stack Traceback *
***********************************
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDADriverFunction<int>::operator()(int)
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDAContext::CUDAContext()
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDAContext::get_instance()
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::RuntimeCUDA::detected()
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::Program(taichi::lang::Arch)
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so(+0x571479) [0x7f3bfafad479]
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so(+0x3ef973) [0x7f3bfae2b973]
python3(_PyCFunction_FastCallDict+0x35c) [0x566bbc]
python3() [0x594a71]
python3() [0x54a035]
python3() [0x5515c1]
/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib/taichi_core.so(+0x3ea3eb) [0x7f3bfae263eb]
python3(_PyObject_FastCallKeywords+0x19c) [0x5a9dac]
python3() [0x50a433]
python3(_PyEval_EvalFrameDefault+0x444) [0x50beb4]
python3() [0x5095c8]
python3() [0x50a2fd]
python3(_PyEval_EvalFrameDefault+0x444) [0x50beb4]
python3() [0x507be4]
python3() [0x509900]
python3() [0x50a2fd]
python3(_PyEval_EvalFrameDefault+0x1226) [0x50cc96]
python3() [0x507be4]
python3(PyEval_EvalCode+0x23) [0x50ad03]
python3() [0x634e72]
python3(PyRun_FileExFlags+0x97) [0x634f27]
python3(PyRun_SimpleFileExFlags+0x17f) [0x6386df]
python3(Py_Main+0x591) [0x639281]
python3(main+0xe0) [0x4b0dc0]
/lib/x86_64-linux-gnu/libc.so.6: __libc_start_main
python3(_start+0x2a) [0x5b259a]

Internal error occurred. Check out this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
  File "testingBackGrid.py", line 5, in <module>
    ti.init(default_fp=ti.f64, arch=ti.gpu) # Try to run on GPU    #GPU, parallel
  File "/home/josh/.local/lib/python3.6/site-packages/taichi/lang/__init__.py", line 216, in init
    ti.get_runtime().create_program()
  File "/home/josh/.local/lib/python3.6/site-packages/taichi/lang/impl.py", line 215, in create_program
    self.prog = taichi_lang_core.Program()
RuntimeError: [cuda_driver.h:operator()@80] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit)

...

Additional comments If possible, please also consider attaching the output of command ti diagnose. This produces the detailed environment information and hopefully helps us diagnose faster.

Here's the output of ti diagonose (I notice the same CUDA error in there but nothing has changed on my system except for taichi):

`[Taichi] mode=release [Taichi] preparing sandbox at /tmp/taichi-h8f13ll9 [Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9

Taichi Programming Language

Docs: https://taichi.rtfd.io/en/stable GitHub: https://github.com/taichi-dev/taichi Forum: https://forum.taichi.graphics

Taichi system diagnose:

python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0] system: linux executable: /usr/bin/python3 platform: Linux-5.4.0-53-generic-x86_64-with-Ubuntu-18.04-bionic architecture: 64bit ELF uname: uname_result(system='Linux', node='jg9', release='5.4.0-53-generic', version='#59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020', machine='x86_64', processor='x86_64') locale: en_US.UTF-8 PATH: /usr/local/cuda-10.1/bin/:/home/josh/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin PYTHONPATH: ['/home/josh/.local/bin', '/home/josh/Desktop/FastIPC', '/home/josh/Desktop/FastIPC/projects/brittle/referenceCode', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/home/josh/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages', '/home/josh/.local/lib/python3.6/site-packages/taichi/core/../lib']

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic

TAICHI_REPO_DIR=

import: <module 'taichi' from '/home/josh/.local/lib/python3.6/site-packages/taichi/init.py'>

cc: True cpu: True metal: False opengl: False cuda: True

glewinfo not available: [Errno 2] No such file or directory: 'glewinfo': 'glewinfo'

nvidia-smi not available: Command '['nvidia-smi']' returned non-zero exit status 255. [Taichi] mode=release [Taichi] preparing sandbox at /tmp/taichi-lrr8036_ [Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9

[Taichi] mode=release [Taichi] preparing sandbox at /tmp/taichi-ojdqcu8n [Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9 [Taichi] Starting on arch=x64

[W 01/12/21 12:03:20.175] [init.py:adaptive_arch_select@574] Arch=[<Arch.opengl: 6>] is not supported, falling back to CPU [Taichi] mode=release [Taichi] preparing sandbox at /tmp/taichi-dri9tsr9 [Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9 [Taichi] Starting on arch=x64

Traceback (most recent call last): File "", line 1, in File "/home/josh/.local/lib/python3.6/site-packages/taichi/lang/init.py", line 216, in init ti.get_runtime().create_program() File "/home/josh/.local/lib/python3.6/site-packages/taichi/lang/impl.py", line 215, in create_program self.prog = taichi_lang_core.Program() RuntimeError: [cuda_driver.h:operator()@80] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit) Taichi CUDA test failed: Command '['/usr/bin/python3', '-c', 'import taichi as ti; ti.init(arch=ti.cuda)']' returned non-zero exit status 1. [Taichi] mode=release [Taichi] preparing sandbox at /tmp/taichi-tdc2kanu [Taichi] version 0.7.12, llvm 10.0.0, commit 8d452801, linux, python 3.6.9

Taichi Programming Language

Docs: https://taichi.rtfd.io/en/stable GitHub: https://github.com/taichi-dev/taichi Forum: https://forum.taichi.graphics

Running example minimal ... [Taichi] Starting on arch=x64 [Taichi] materializing...

Running time: 0.15s 42

Consider attaching this log when maintainers ask about system information.

Running time: 4.73s `

JoshWolper commented 3 years ago

Oh and I should mention I upgraded through using: python3 -m pip install taichi --upgrade

k-ye commented 3 years ago

nvidia-smi not available: Command '['nvidia-smi']' returned non-zero exit status 255.

It's a bit weird to see this error. I think it indicates an NVidia driver issue. What's your CUDA version?

JoshWolper commented 3 years ago

Hmm weird, the last version of taichi worked fine with this CUDA driver. Here's the output of nvcc --version:

josh@jg9:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

k-ye commented 3 years ago

RuntimeError: [cuda_driver.h:operator()@80] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit)

Maybe https://github.com/pytorch/pytorch/issues/40671#issuecomment-650712854?

JoshWolper commented 3 years ago

Oh wow yep, a reboot fixed the issue!! Thanks for this, I'll reboot after updating taichi from now on!!

taichi-dev / taichi

Upgraded from 0.6.37 to 0.7.12 and it broke ti.GPU #2153