issues
search
microsoft
/
MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
https://aka.ms/MInference
MIT License
571
stars
20
forks
source link
Hotfix(MInference): fix the pip setup issue
#6
Closed
iofu728
closed
2 weeks ago
iofu728
commented
2 weeks ago
fix the pip setup issue;
update the pdf link;
fix the unittest pipeline;