Hotfix(MInference): fix the pip setup issue

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

https://aka.ms/MInference

MIT License

571 stars 20 forks source link

Hotfix(MInference): fix the pip setup issue #6

Closed iofu728 closed 2 weeks ago

iofu728 commented 2 weeks ago

fix the pip setup issue;
update the pdf link;
fix the unittest pipeline;