/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
[rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] torch._dynamo hit config.cache_size_limit (8)
[rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] function: 'call' (/cfs/fjr2/xDiT/xfuser/model_executor/layers/attention_processor.py:635)
[rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] last reason: tensor 'L['hidden_states']' stride mismatch at index 0. expected 13369344, actual 2359296
[rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph.
warnings.warn(msg)
7%|███████████████▌ | 2/28 [03:55<51:06, 117.96s/it][rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] torch._dynamo hit config.cache_size_limit (8)
[rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] function: 'call' (/cfs/fjr2/xDiT/xfuser/model_executor/layers/attention_processor.py:635)
[rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] last reason: tensor 'L['hidden_states']' stride mismatch at index 0. expected 13369344, actual 2359296
[rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) [rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] torch._dynamo hit config.cache_size_limit (8) [rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] function: 'call' (/cfs/fjr2/xDiT/xfuser/model_executor/layers/attention_processor.py:635) [rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] last reason: tensor 'L['hidden_states']' stride mismatch at index 0. expected 13369344, actual 2359296 [rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] To log all recompilation reasons, use TORCH_LOGS="recompiles". [rank6]:W1105 07:32:35.916000 139791969604736 torch/_dynamo/convert_frame.py:744] [5/64] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html. /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:661: UserWarning: Graph break due to unsupported builtin flash_attn_2_cuda.PyCapsule.fwd. This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/docs/main/notes/custom_operators.html for more details) or, if it is traceable, use torch.compiler.allow_in_graph. warnings.warn(msg) 7%|███████████████▌ | 2/28 [03:55<51:06, 117.96s/it][rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] torch._dynamo hit config.cache_size_limit (8) [rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] function: 'call' (/cfs/fjr2/xDiT/xfuser/model_executor/layers/attention_processor.py:635) [rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] last reason: tensor 'L['hidden_states']' stride mismatch at index 0. expected 13369344, actual 2359296 [rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] To log all recompilation reasons, use TORCH_LOGS="recompiles". [rank7]:W1105 07:32:39.140000 139950071841920 torch/_dynamo/convert_frame.py:744] [5/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.