NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 8192, 32, 128) (torch.bfloat16)
key : shape=(1, 8192, 32, 128) (torch.bfloat16)
value : shape=(1, 8192, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
flshattF@v2.3.6 is not supported because:
requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
bf16 is only supported on A100+ GPUs
tritonflashattF is not supported because:
requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see python -m xformers.info for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
cutlassF is not supported because:
bf16 is only supported on A100+ GPUs
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
[2024-07-10 15:30:11,478] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 206239) of binary: /opt/mistral-finetune-main/my_venv/bin/python3.10
Traceback (most recent call last):
File "/opt/mistral-finetune-main/my_venv/bin/torchrun", line 8, in
sys.exit(main())
have any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?
Expected Behavior
have any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?
Additional Context
python -m xformers.info
xFormers 0.0.24
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.6: available
memory_efficient_attention.flshattB@v2.3.6: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
sequence_parallel_fused.write_values: unavailable
sequence_parallel_fused.wait_values: unavailable
sequence_parallel_fused.cuda_memset_32b_async: unavailable
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm@0.4.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.2.0+cu121
pytorch.cuda: available
gpu.compute_capability: 6.1
gpu.name: NVIDIA GeForce GTX 1080
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1201
build.python_version: 3.10.13
build.torch_version: 2.2.0+cu121
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.24
build.nvcc_version: 12.1.66
source.privacy: open source
Python Version
Pip Freeze
Reproduction Steps
1) torchrun --nproc-per-node 1 -m train example/7B.yaml
And Get Error Like
NotImplementedError: No operator found for
sys.exit(main())
memory_efficient_attention_forward
with inputs: query : shape=(1, 8192, 32, 128) (torch.bfloat16) key : shape=(1, 8192, 32, 128) (torch.bfloat16) value : shape=(1, 8192, 32, 128) (torch.bfloat16) attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> p : 0.0flshattF@v2.3.6
is not supported because: requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) bf16 is only supported on A100+ GPUstritonflashattF
is not supported because: requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs operator wasn't built - seepython -m xformers.info
for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4cutlassF
is not supported because: bf16 is only supported on A100+ GPUssmallkF
is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs unsupported embed per head: 128 [2024-07-10 15:30:11,478] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 206239) of binary: /opt/mistral-finetune-main/my_venv/bin/python3.10 Traceback (most recent call last): File "/opt/mistral-finetune-main/my_venv/bin/torchrun", line 8, inhave any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?
Expected Behavior
have any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?
Additional Context
python -m xformers.info
xFormers 0.0.24 memory_efficient_attention.cutlassF: available memory_efficient_attention.cutlassB: available memory_efficient_attention.decoderF: available memory_efficient_attention.flshattF@v2.3.6: available memory_efficient_attention.flshattB@v2.3.6: available memory_efficient_attention.smallkF: available memory_efficient_attention.smallkB: available memory_efficient_attention.tritonflashattF: unavailable memory_efficient_attention.tritonflashattB: unavailable memory_efficient_attention.triton_splitKF: unavailable indexing.scaled_index_addF: unavailable indexing.scaled_index_addB: unavailable indexing.index_select: unavailable sequence_parallel_fused.write_values: unavailable sequence_parallel_fused.wait_values: unavailable sequence_parallel_fused.cuda_memset_32b_async: unavailable sp24.sparse24_sparsify_both_ways: available sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available sp24._cslt_sparse_mm@0.4.0: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: False pytorch.version: 2.2.0+cu121 pytorch.cuda: available gpu.compute_capability: 6.1 gpu.name: NVIDIA GeForce GTX 1080 dcgm_profiler: unavailable build.info: available build.cuda_version: 1201 build.python_version: 3.10.13 build.torch_version: 2.2.0+cu121 build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0 build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.24 build.nvcc_version: 12.1.66 source.privacy: open source
Suggested Solutions
No response