yangjianxin1 / Firefly

Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
5.9k stars 527 forks source link

NotImplementedError: No operator found for `memory_efficient_attention_forward` #258

Open sankexin opened 6 months ago

sankexin commented 6 months ago

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 1024, 8, 4, 128) (torch.float32) key : shape=(1, 1024, 8, 4, 128) (torch.float32) value : shape=(1, 1024, 8, 4, 128) (torch.float32)

pip show xformers: Name: xformers Version: 0.0.26.post1

python3 -m xformers.info: WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.1.0+cu121) Python 3.10.14 (you have 3.10.0) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured. xFormers 0.0.26.post1 memory_efficient_attention.ckF: unavailable memory_efficient_attention.ckB: unavailable memory_efficient_attention.ck_decoderF: unavailable memory_efficient_attention.ck_splitKF: unavailable memory_efficient_attention.cutlassF: unavailable memory_efficient_attention.cutlassB: unavailable memory_efficient_attention.decoderF: unavailable memory_efficient_attention.flshattF@0.0.0: unavailable memory_efficient_attention.flshattB@0.0.0: unavailable memory_efficient_attention.smallkF: unavailable memory_efficient_attention.smallkB: unavailable memory_efficient_attention.triton_splitKF: available indexing.scaled_index_addF: available indexing.scaled_index_addB: available indexing.index_select: available sequence_parallel_fused.write_values: unavailable sequence_parallel_fused.wait_values: unavailable sequence_parallel_fused.cuda_memset_32b_async: unavailable sp24.sparse24_sparsify_both_ways: unavailable sp24.sparse24_apply: unavailable sp24.sparse24_apply_dense_output: unavailable sp24._sparse24_gemm: unavailable sp24._cslt_sparse_mm@0.0.0: available swiglu.dual_gemm_silu: unavailable swiglu.gemm_fused_operand_sum: unavailable swiglu.fused.p.cpp: not built is_triton_available: True pytorch.version: 2.1.0+cu121 pytorch.cuda: available gpu.compute_capability: 8.0 gpu.name: NVIDIA A800 80GB PCIe dcgm_profiler: unavailable build.info: available build.cuda_version: 1201 build.hip_version: None build.python_version: 3.10.14 build.torch_version: 2.3.0+cu121 build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0 build.env.PYTORCH_ROCM_ARCH: None build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.26.post1 source.privacy: open source