Inconsistent, but seemingly deterministic crash of runtime in debug mode (repro attached)

Hi guys,

Thank you for your absolutely aaawweesome work :-)

With intention to help you make improvements, here's a seemingly deterministic repro of runtime crash, when debug mode is enabled. This happens when a complex pattern of allocating fields and writing to them is done. This seems to be independent of the types or shapes of the fields involved.

It got confirmed on multiple machines, on Windows & Ubuntu, on Intel processors.

When debug=True is enabled, and you have many fields that you create and update in a complicated fashion, the runtime is likely to crash.

This crash was confirmed with this repro on multiple machines, on Windows and Ubuntu. It's run on cpu with 1 thread.

I got a minimal super-simplified crashing program. It involves creating multiple field and writing to them in a pattern. The types and sizes/dimensions/shapes of the fields seem to not matter at all, only the sequence in which they are created and written to seems to matter.

At the end to get the crash, we need to access one of the fields' value outside of kernel to determine the size of another new field. And we get a crash on subsequent access.

Please, note, that changing the allocation & writing pattern easily removes/hides the crash. Also sometimes, and often, the crash happens quietly without the stack trace printed.

btw. Is there a reason why I'm seeing win_amd64.pyd files on the crash stack, while I'm on an intel based cpu here?

Thank you, Adrian

The crashing program is included in minimal_debug_crash.txt (renamed to be able to attach it here) minimal_debug_crash.txt

The crash stack trace I'm getting is attached in crash.txt crash.txt

Code inlined for convenience:

import taichi as ti

@ti.kernel
def WriteSingleInt(field: ti.template()):
    field[None] = 1

def CreateField(shape = ()):
    return ti.field(int, shape=shape)

def main():
    ti.init(
        ti.cpu,
        #cpu_max_num_threads=4,
        debug=True,
        # kernel_profiler=True,
        # random_seed=42,
    )

    # hold on to all fields, just in case
    allFields = []

    # do field allocation & assignment in a pattern
    seq = [(13,1), (1,2), (3, 1), (1,5)]
    for loopCount, batchSize in seq:
        for _ in range(loopCount):
            fields = []
            for _ in range(batchSize):
                fields += [CreateField()]
            for field in fields:
                WriteSingleInt(field)

            allFields += fields

    # alloc and write 'size' to a field            
    intFieldA = CreateField()
    WriteSingleInt(intFieldA)

    # alloc another field after that
    intFieldB = CreateField()

    # use 'size' from earlier field to create another new field
    unusedIntFieldC = CreateField(intFieldA[None])

    # crash:
    print("crash here:")
    WriteSingleInt(intFieldB)
    print("did not crash ?!?")

if __name__ == "__main__":
    main()

taichi-dev / taichi

Inconsistent, but seemingly deterministic crash of runtime in debug mode (repro attached) #8569