taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.56k stars 2.29k forks source link

Signal 11 (SIGSEGV) when using ti.ext_arr() #3138

Closed KuPao closed 3 years ago

KuPao commented 3 years ago

Describe the bug I encountered SIGSEGV when using ti.ext_arr(). In my previous code, this problem did not occur when the numpy array size was (1, 3). But when I use the array of (n, 3) (n <= 3), the error appears. The error randomly appear (but it will definitely happen), but there is no randomness in my program. Most of the time Taichi Compiler Stack Traceback is empty, but sometimes there will be the following message.

To Reproduce

@ti.kernel
def substep(g: ti.f32, mass: ti.f32, is_collision: ti.i32, normal: ti.ext_arr(), dis: ti.ext_arr(), number: ti.i32, dir:ti.f32, new_p: ti.ext_arr(), new_v: ti.ext_arr(), r_fs: ti.ext_arr(), pivots: ti.ext_arr()):
    for i in ti.grouped(p):
        p[i] = ti.Vector([new_p[0],new_p[1],new_p[2]])
        v[i] = ti.Vector([new_v[0],new_v[1],new_v[2]])
        acc[i] = g * mass * ti.Vector([0.0, -1, 0.0])
        if number > 0:
            for j in range(number):
                I = r_fs[j] * mass
                f = (3200 * mass * dir * dis[j])/number * ti.Vector([normal[j, 0], normal[j, 1], normal[j, 2]])
                torque = r_fs[j] * f
                acc[i] += (3200 * mass * dir * dis[j])/number * ti.Vector([normal[j, 0], normal[j, 1], normal[j, 2]])
        acc[i] /= mass
        v[i] += acc[i] * dt
        p[i] += dt * v[i]
            normals = np.empty((len(collisions[tmodel]), 3), np.float32)
            distances = np.empty((len(collisions[tmodel])), np.float32)
            pivots = np.empty((len(collisions[tmodel]), 3), np.float32)
            r_fs = np.empty((len(collisions[tmodel])), np.float32)
            if len(collisions[tmodel]) == 0:
                normals = np.zeros((3,3), np.float32)
                pivots = np.zeros((3,3), np.float32)
            for i in range(len(collisions[tmodel])):
                normals[i] = collisions[tmodel][i].normal
                distances[i] = float(collisions[tmodel][i].depth)
                r_fs[i] = float(collisions[tmodel][i].r_f)
                pivots[i] = collisions[tmodel][i].pivot
            substep(gravity, float(formula.mass), is_collision, normals, distances, len(collisions[tmodel]), float(direction),new_p,new_v,r_fs,pivots)

Log/Screenshots Please post the full log of the program (instead of just a few lines around the error message, unless the log is > 1000 lines). This will help us diagnose what's happening. For example:

PS E:\taichi_three-master> & C:/Users/KuPao/AppData/Local/Programs/Python/Python37/python.exe e:/taichi_three-master/examples/cylinder_test.py
[Tina] version 0.1.1
[Taichi] mode=release
[Taichi] version 0.7.14, llvm 10.0.0, commit 58feee37, win, python 3.7.6
[Tina] Taichi properties hacked
[Taichi] Starting on arch=cuda
[Tina] Hint: MMB to orbit, Shift+MMB to pan, wheel to zoom
[Taichi] materializing...
[Tina] Cooking skybox (1024x512 128 spp)...
[Tina] Baking IBL map (170x85 4096 spp) for Lambert...
[Tina] Denoising IBL map with KNN for Lambert...
[Tina] Baking IBL map for Lambert done
[Tina] Baking IBL map (820x410 292 spp) for CookTorrance with roughness 0.08...
[Tina] Baking IBL map (620x310 363 spp) for CookTorrance with roughness 0.18...
[Tina] Baking IBL map (386x193 563 spp) for CookTorrance with roughness 0.35...
[Tina] Baking IBL map (168x84 1188 spp) for CookTorrance with roughness 0.65...
[Tina] Baking IBL map (62x31 4439 spp) for CookTorrance with roughness 1.0...
[Tina] Baking IBL map for CookTorrance done
[E 10/07/21 14:02:04.206] Received signal 11 (SIGSEGV)

***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffeb35be277: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb3676d13: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb367da82: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7fff23b50ef5: seh_filter_exe in ucrtbase.dll
0x7ff66f4223d8: Unknown Function in python.exe
0x7fff173cc720: _C_specific_handler in VCRUNTIME140.dll
0x7fff2641217f: _chkstk in ntdll.dll
0x7fff263c1454: RtlRaiseException in ntdll.dll
0x7fff26410cae: KiUserExceptionDispatcher in ntdll.dll
0x7fff2639e414: RtlAllocateHeap in ntdll.dll
0x7fff2639b44d: RtlAllocateHeap in ntdll.dll
0x7fff23aefde6: malloc_base in ucrtbase.dll
0x7ffeb546a6eb: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb337ec45: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb3661aba: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb35a0968: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb359929f: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb358d116: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffeb3406772: taichi::create_instance_placement<taichi::Benchmark> in taichi_core.pyd
0x7ffee7a68c55: PyMethodDef_RawFastCallKeywords in python37.dll
0x7ffee7a7a96e: PyCFunction_FastCallKeywords in python37.dll
0x7ffee7a695fe: PyMethodDef_RawFastCallKeywords in python37.dll
0x7ffee7a69fe2: PyEval_EvalFrameDefault in python37.dll
0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll
0x7ffee7a513ea: PyFunction_FastCallDict in python37.dll
0x7ffee7a5039a: PyMethodDef_RawFastCallDict in python37.dll
0x7ffee7a4d6ac: PyObject_SetAttr in python37.dll
0x7ffee7a6a8a5: PyEval_EvalFrameDefault in python37.dll
0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll
0x7ffee7a695cc: PyMethodDef_RawFastCallKeywords in python37.dll
0x7ffee7a69b33: PyEval_EvalFrameDefault in python37.dll
0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll
0x7ffee7a23637: PyEval_EvalCodeEx in python37.dll
0x7ffee7a23595: PyEval_EvalCode in python37.dll
0x7ffee7a2353f: PyArena_Free in python37.dll
0x7ffee7bc525d: PyRun_FileExFlags in python37.dll
0x7ffee7bc5a84: PyRun_SimpleFileExFlags in python37.dll
0x7ffee7bc512b: PyRun_AnyFileExFlags in python37.dll
0x7ffee7b11047: Py_UnixMain in python37.dll
0x7ffee7b110ef: Py_UnixMain in python37.dll
0x7ffee7a80b02: PyErr_NoMemory in python37.dll
0x7ffee7a21077: Py_Main in python37.dll
0x7ffee7a21052: Py_Main in python37.dll
0x7ff66f421258: Unknown Function in python.exe
0x7fff24757034: BaseThreadInitThunk in KERNEL32.DLL
0x7fff263c2651: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting

Additional comments

bobcao3 commented 3 years ago

The stack trace has a PyErr_NoMemory in it which indicates an out of memory error. What's the system ram usage when running that program?

On Sat, Oct 9, 2021, 6:47 PM KuPao @.***> wrote:

Describe the bug I encountered SIGSEGV when using ti.ext_arr(). In my previous code, this problem did not occur when the numpy array size was (1, 3). But when I use the array of (n, 3) (n <= 3), the error appears. The error randomly appear (but it will definitely happen), but there is no randomness in my program. Most of the time Taichi Compiler Stack Traceback is empty, but sometimes there will be the following message.

To Reproduce

@ti.kerneldef substep(g: ti.f32, mass: ti.f32, is_collision: ti.i32, normal: ti.ext_arr(), dis: ti.ext_arr(), number: ti.i32, dir:ti.f32, new_p: ti.ext_arr(), new_v: ti.ext_arr(), r_fs: ti.ext_arr(), pivots: ti.ext_arr()): for i in ti.grouped(p): p[i] = ti.Vector([new_p[0],new_p[1],new_p[2]]) v[i] = ti.Vector([new_v[0],new_v[1],new_v[2]]) acc[i] = g mass ti.Vector([0.0, -1, 0.0]) if number > 0: for j in range(number): I = r_fs[j] mass f = (3200 mass dir dis[j])/number ti.Vector([normal[j, 0], normal[j, 1], normal[j, 2]]) torque = r_fs[j] f acc[i] += (3200 mass dir dis[j])/number ti.Vector([normal[j, 0], normal[j, 1], normal[j, 2]]) acc[i] /= mass v[i] += acc[i] dt p[i] += dt v[i]

        normals = np.empty((len(collisions[tmodel]), 3), np.float32)
        distances = np.empty((len(collisions[tmodel])), np.float32)
        pivots = np.empty((len(collisions[tmodel]), 3), np.float32)
        r_fs = np.empty((len(collisions[tmodel])), np.float32)
        if len(collisions[tmodel]) == 0:
            normals = np.zeros((3,3), np.float32)
            pivots = np.zeros((3,3), np.float32)
        for i in range(len(collisions[tmodel])):
            normals[i] = collisions[tmodel][i].normal
            distances[i] = float(collisions[tmodel][i].depth)
            r_fs[i] = float(collisions[tmodel][i].r_f)
            pivots[i] = collisions[tmodel][i].pivot
        substep(gravity, float(formula.mass), is_collision, normals, distances, len(collisions[tmodel]), float(direction),new_p,new_v,r_fs,pivots)

Log/Screenshots Please post the full log of the program (instead of just a few lines around the error message, unless the log is > 1000 lines). This will help us diagnose what's happening. For example:

PS E:\taichi_three-master> & C:/Users/KuPao/AppData/Local/Programs/Python/Python37/python.exe e:/taichi_three-master/examples/cylinder_test.py [Tina] version 0.1.1 [Taichi] mode=release [Taichi] version 0.7.14, llvm 10.0.0, commit 58feee37, win, python 3.7.6 [Tina] Taichi properties hacked [Taichi] Starting on arch=cuda [Tina] Hint: MMB to orbit, Shift+MMB to pan, wheel to zoom [Taichi] materializing... [Tina] Cooking skybox (1024x512 128 spp)... [Tina] Baking IBL map (170x85 4096 spp) for Lambert... [Tina] Denoising IBL map with KNN for Lambert... [Tina] Baking IBL map for Lambert done [Tina] Baking IBL map (820x410 292 spp) for CookTorrance with roughness 0.08... [Tina] Baking IBL map (620x310 363 spp) for CookTorrance with roughness 0.18... [Tina] Baking IBL map (386x193 563 spp) for CookTorrance with roughness 0.35... [Tina] Baking IBL map (168x84 1188 spp) for CookTorrance with roughness 0.65... [Tina] Baking IBL map (62x31 4439 spp) for CookTorrance with roughness 1.0... [Tina] Baking IBL map for CookTorrance done [E 10/07/21 14:02:04.206] Received signal 11 (SIGSEGV)


  • Taichi Compiler Stack Traceback *

    0x7ffeb35be277: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb3676d13: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb367da82: taichi::create_instance_placement in taichi_core.pyd 0x7fff23b50ef5: seh_filter_exe in ucrtbase.dll 0x7ff66f4223d8: Unknown Function in python.exe 0x7fff173cc720: _C_specific_handler in VCRUNTIME140.dll 0x7fff2641217f: _chkstk in ntdll.dll 0x7fff263c1454: RtlRaiseException in ntdll.dll 0x7fff26410cae: KiUserExceptionDispatcher in ntdll.dll 0x7fff2639e414: RtlAllocateHeap in ntdll.dll 0x7fff2639b44d: RtlAllocateHeap in ntdll.dll 0x7fff23aefde6: malloc_base in ucrtbase.dll 0x7ffeb546a6eb: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb337ec45: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb3661aba: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb35a0968: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb359929f: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb358d116: taichi::create_instance_placement in taichi_core.pyd 0x7ffeb3406772: taichi::create_instance_placement in taichi_core.pyd 0x7ffee7a68c55: PyMethodDef_RawFastCallKeywords in python37.dll 0x7ffee7a7a96e: PyCFunction_FastCallKeywords in python37.dll 0x7ffee7a695fe: PyMethodDef_RawFastCallKeywords in python37.dll 0x7ffee7a69fe2: PyEval_EvalFrameDefault in python37.dll 0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll 0x7ffee7a513ea: PyFunction_FastCallDict in python37.dll 0x7ffee7a5039a: PyMethodDef_RawFastCallDict in python37.dll 0x7ffee7a4d6ac: PyObject_SetAttr in python37.dll 0x7ffee7a6a8a5: PyEval_EvalFrameDefault in python37.dll 0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll 0x7ffee7a695cc: PyMethodDef_RawFastCallKeywords in python37.dll 0x7ffee7a69b33: PyEval_EvalFrameDefault in python37.dll 0x7ffee7a51766: PyEval_EvalCodeWithName in python37.dll 0x7ffee7a23637: PyEval_EvalCodeEx in python37.dll 0x7ffee7a23595: PyEval_EvalCode in python37.dll 0x7ffee7a2353f: PyArena_Free in python37.dll 0x7ffee7bc525d: PyRun_FileExFlags in python37.dll 0x7ffee7bc5a84: PyRun_SimpleFileExFlags in python37.dll 0x7ffee7bc512b: PyRun_AnyFileExFlags in python37.dll 0x7ffee7b11047: Py_UnixMain in python37.dll 0x7ffee7b110ef: Py_UnixMain in python37.dll 0x7ffee7a80b02: PyErr_NoMemory in python37.dll 0x7ffee7a21077: Py_Main in python37.dll 0x7ffee7a21052: Py_Main in python37.dll 0x7ff66f421258: Unknown Function in python.exe 0x7fff24757034: BaseThreadInitThunk in KERNEL32.DLL 0x7fff263c2651: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:https://taichi.readthedocs.io/en/stable/install.html#troubleshooting

Additional comments

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/taichi-dev/taichi/issues/3138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACY7Q5EWKOCLEGVINIPSAETUGDWEXANCNFSM5FV442WA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

strongoier commented 3 years ago

I'm not sure if this is the case, but before Taichi v0.7.32 there was a limitation that a kernel could not have more than 8 parameters. Is it convenient for you to upgrade Taichi and try again?

KuPao commented 3 years ago

I'm not sure if this is the case, but before Taichi v0.7.32 there was a limitation that a kernel could not have more than 8 parameters. Is it convenient for you to upgrade Taichi and try again?

Tina seems incompatible with Taichi v0.7.32, which doesn't update half year.

The stack trace has a PyErr_NoMemory in it which indicates an out of memory error. What's the system ram usage when running that program?

It's around 35%, and sometime stack trace didn't show any line. I'm confused by the problem and the error message.

bobcao3 commented 3 years ago

It might also be a Tina / taichi compatibility problem, Tina is kind of deprecated rn. May I ask what is the Tina feature you are using rn?

k-ye commented 3 years ago

Thx for reporting this! Tina is not an officially maintained project. I'd suggest to migrate to GGUI (https://docs.taichi.graphics/lang/articles/misc/ggui) if possible

KuPao commented 3 years ago

It might also be a Tina / taichi compatibility problem, Tina is kind of deprecated rn. May I ask what is the Tina feature you are using rn?

I mainly use Tina to draw some simple geometry (cylinder, grid) and using Tina for transform.

KuPao commented 3 years ago

I'm not sure if this is the case, but before Taichi v0.7.32 there was a limitation that a kernel could not have more than 8 parameters. Is it convenient for you to upgrade Taichi and try again?

I think this is the problem. It works well after I eliminate some parameters. Thanks.