issues
search
microsoft
/
MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
https://aka.ms/MInference
MIT License
572
stars
20
forks
source link
[Bug]: MInference必须使用fla-attention吗?加速推理,A6000服务器不支持flas-attention
#24
Closed
yawzhe
closed
2 weeks ago
yawzhe
commented
2 weeks ago
Describe the bug
No response
Steps to reproduce
No response
Expected Behavior
No response
Logs
No response
Additional Information
No response
iofu728
commented
2 weeks ago
This issue is being closed due to duplication #23.
Describe the bug
No response
Steps to reproduce
No response
Expected Behavior
No response
Logs
No response
Additional Information
No response