nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

SIGSEGV on custom layer #603

Open JBloodless opened 7 months ago

JBloodless commented 7 months ago

Hi. I'm trying to export custom torch layer, but I'm getting sigsegv on usual runs and uninformative output in debugger. Here's the repro:

import torch
import torch.nn as nn
import shark_turbine.aot as aot

class PoolAvg(torch.nn.Module):
    """
    PoolAvg: Average pooling that consideres masked time-steps.
    """

    def __init__(self, d_input, output_size):
        super().__init__()

        self.linear = nn.Linear(d_input, output_size)

    def forward(self, x, n_wins):
        mask = torch.arange(x.shape[1])[None, :] < n_wins[:, None].to("cpu").to(torch.long)
        mask = ~mask.unsqueeze(2).to(x.device)
        x.masked_fill_(mask, 0.0)
        x = torch.div(x.sum(1), n_wins.unsqueeze(1))
        x = self.linear(x)
        return x

class PoolModel(nn.Module):
    def __init__(self,):
        super().__init__()

        self.pool_layers = nn.ModuleList(
            [
                PoolAvg(
                    256,
                    output_size=1,
                )
                for _ in range(5)
            ]
        )

    def forward(self, x, n_wins):
        out = [mod(x, n_wins) for mod in self.pool_layers]
        out = torch.cat(out, dim=1)

        return out

x = torch.zeros((1, 63, 256), dtype=torch.float)
n_wins = 63

example_input = (x, torch.as_tensor(n_wins).unsqueeze(0))

model = PoolModel()
model.eval()

export_output = aot.export(model, *example_input)
binary = export_output.compile(save_to=None)

Here's debugger log:

/Users/i.beskrovnyy/anaconda3/envs/shark_re/bin/python /Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 55616 --file /Users/i.beskrovnyy/tts/NISQA-s/repros/repro_pool.py 
Connected to pydev debugger (build 223.8836.43)
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 2195, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 2177, in main
    globals = debugger.run(setup['file'], None, None, is_module)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1489, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/i.beskrovnyy/tts/NISQA-s/repros/repro_pool.py", line 54, in <module>
    export_output = aot.export(model, *example_input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/shark/SHARK-Turbine/core/shark_turbine/aot/exporter.py", line 216, in export
    exported_program = torch.export.export(
                       ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/__init__.py", line 174, in export
    return _export(
           ^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 635, in wrapper
    raise e
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 618, in wrapper
    ep = fn(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/exported_program.py", line 83, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 860, in _export
    gm_torch_level = _export_to_torch_ir(
                     ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 347, in _export_to_torch_ir
    gm_torch_level, _ = torch._dynamo.export(
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1311, in inner
    result_traced = opt_f(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
    transformations(instructions, code_options)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 500, in transform
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
    super().run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
    and self.step()
        ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
    getattr(self, inst.opname)(inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1802, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2285, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2399, in inline_call_
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
    and self.step()
        ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
    getattr(self, inst.opname)(inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1802, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2285, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2399, in inline_call_
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
    and self.step()
        ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
    getattr(self, inst.opname)(inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1802, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 629, in call_function
    unimplemented(msg)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 190, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: 'skip function fspath in file Builtin fspath'
from user code:
   File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_trace_dispatch_regular.py", line 202, in trace_dispatch
    thread_trace_func, apply_to_settrace = fix_top_level_trace_and_get_trace_func(py_db, frame)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_trace_dispatch_regular.py", line 85, in fix_top_level_trace_and_get_trace_func
    name = splitext(basename(filename))[0]
  File "<frozen posixpath>", line 142, in basename

I'm using the latest version of turbine. Which part of my code causing this? Maybe I can just replace some operation, but I'm completely lost on which one.

stellaraccident commented 7 months ago

The key is this part of the stack trace:

File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/init.py", line 174, in export return _export(

This says that the native crash is happening in pytorch. We can:

  1. Attach a native debugger and try to get more information.
  2. Extract a repro and file a pt issue.

For 2, replace aot.export with an equivalent call to torch.export.export. it should crash in the same way (that is the first thing our export does). Then that can be filed and further debugged with pytorch.

stellaraccident commented 7 months ago

I'd recommend trying the latest torch nightly when checking the repro. Many times these things get fixed.

JBloodless commented 7 months ago

The weird thing is that if I comment out last string with .compile, code runs fine with both aot.export and torch.export.export. The error happens only when binary = export_output.compile(save_to=None) is present. Moreover, if I try to debug code without binary = export_output.compile(save_to=None), debugger will crash (although usual run was fine), and debug log is the same as with compile.

Here's debug log with torch.export.export (it's the same with aot.export, since aot.export crashes on torch.export.export call, just as you said)

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 2195, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 2177, in main
    globals = debugger.run(setup['file'], None, None, is_module)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1489, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/i.beskrovnyy/tts/NISQA-s/repros/repro_pool.py", line 55, in <module>
    export_output = torch.export.export(model, example_input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/__init__.py", line 174, in export
    return _export(
           ^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 836, in wrapper
    raise e
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 819, in wrapper
    ep = fn(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/exported_program.py", line 85, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 1072, in _export
    gm_torch_level = _export_to_torch_ir(
                     ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/export/_trace.py", line 430, in _export_to_torch_ir
    gm_torch_level, _ = torch._dynamo.export(
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1237, in inner
    result_traced = opt_f(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 410, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/tts/NISQA-s/repros/repro_pool.py", line 39, in forward
    def forward(self, x, n_wins):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_frame.py", line 164, in trace_return
    send_signature_return_trace(main_debugger, frame, filename, arg)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 976, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_utils_internal.py", line 70, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 698, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 265, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 553, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1113, in transform_code_object
    transformations(instructions, code_options)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 173, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 515, in transform
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2201, in run
    super().run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 857, in run
    while self.step():
          ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 767, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 491, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1828, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 707, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 713, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2361, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2477, in inline_call_
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 857, in run
    while self.step():
          ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 767, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 491, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1828, in CALL
    self.call_function(fn, args, kwargs)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 707, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 713, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2361, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2477, in inline_call_
    tracer.run()
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 857, in run
    while self.step():
          ^^^^^^^^^^^
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 767, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/Users/i.beskrovnyy/anaconda3/envs/shark_re/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 469, in inner
    raise exc.UserError(
torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands
from user code:
   File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_signature.py", line 198, in send_signature_return_trace
    signature = dbg.signature_factory.create_signature(frame, filename, with_args=False)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_signature.py", line 97, in create_signature
    _, modulename, funcname = self.file_module_function_of(frame)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_signature.py", line 110, in file_module_function_of
    if filename:
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

P.S. Torch version 2.4.0.dev20240411

stellaraccident commented 7 months ago

Native crashes can be a bit tricky because what actually crashed doesn't always correlate to the python trace. Dynamo has gotten better in its failure cases, but I still do see crashes and bugs that come from dynamo trying to report a high level error message... And then that leads you down a rabbit hole because you end up debugging the error message code vs the root cause.

For native crashes, the gold standard is a gdb backtrace with torch and iree binaries that have debug symbols. But even just a bt on normal release binaries is often enough to route the issue to the right high level component.