vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.51k stars 4.43k forks source link

[Bug]: flash_attn # prefix-enabled attention case forward code maybe error? #6720

Open yangchengtest opened 3 months ago

yangchengtest commented 3 months ago

Your current environment

code review

🐛 Describe the bug

flash_attn.py forward func

code: else:

prefix-enabled attention

            assert prefill_meta.seq_lens is not None
            max_seq_len = max(prefill_meta.seq_lens)
            flash_attn_varlen_func(

in this case,flash_attn_varlen_func input params order may be wrong.

cu_seqlens_q=prefill_meta.query_start_loc, max_seqlen_q=prefill_meta.max_query_len, cu_seqlens_k=prefill_meta.seq_start_loc, max_seqlen_k=max_seq_len,

the input order should be cu_s,cu_q,max_s,max_q?

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!